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1  Introduction 

This  report  summarizes  the  USC  Image  Understanding  research  projects  for  the  period  of 
June,  1989  to  September  1990  on  Contract  #F33615-87-C-1436.  Along  with  our  previous 
annual  reports  (USCIRIS  #23 8  and  #258),  it  also  constitutes  our  final  report  for  the  project. 
This  report  consists  of  a  summary  section,  followed  by  a  number  of  detailed  technical  papers. 
These  papers  have  already  been  published  in  conference  or  workshop  proceedings;  to  save 
time  and  effort,  we  have  reproduced  these  papers  in  the  final  document  as  they  originally 
appeared.  The  work  in  these  detailed  papers  and  in  the  previous  reports  is  covered  only 
briefly  in  this  summary. 

Our  research  activity  under  this  contract  has  focussed  on  the  following  major  topics: 

•  3-D  Vision 

•  Mapping  from  Aerial  Images  and 

•  Parallel  Processing 

A  summary  of  these  areas  is  given  in  the  next  section. 
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2  Summary 
2.1  3-D  VISION  • 

Our  goal  is  to  develop  techniques  for  description  and  recognition  of  complex  3-D  objects  in 
complex  scenes.  We  focus  on  the  analysis  of  objects  using  shape  (as  opposed  to  color,  texture 
or  other  cues)  and  have  made  significant  progress  in  the  contract  period.  In  particular,  we 
have  concentrated  on  the  following: 


•  Range  image  analysis 

We  have  made  progress  in  the  automatic  acquisition  of  models  from  multiple  views, 
using  either  symbolic  or  iconic  representations.  These  models  are  useable  for  a  variety 
of  applications,  including  object  recognition  as  in  a  system  described  in  earlier  reports 
[14]. 


Stereo 


—  We  have  completed  a  system  which  combines  area-based  and  feature-based  pro¬ 
cessing  to  generate  dense  disparity  maps. 

-  We  have  excellent  results  performing  stereo  matching  using  very  high  level  prim¬ 
itives  resulting  from  perceptual  organization 

-  In  the  special  case  of  urban  scenes,  we  have  used  “snakes”  to  accurately  delineate 
the  contours  of  building  tops. 

•  Shape  from  contour 

We  have  developed  a  theory  for  inferring  the  3-D  shape  of  objects  from  their  contours. 
This  technique  relies  on  observations  of  certain  types  of  symmetries  in  the  contours  and 
the  mathematical  constraints  that  derive  from  them.  Our  technique  uses  relatively  few 
assumptions  and  heuristics  and  is  largely  based  on  geometric  properties  of  contours. 
We  have  shown  that  it  is  applicable  to  the  analysis  of  zero-Gaussian  curvatures  surfaces, 
straight  homogeneous  generalized  cylinders,  and  “snakes”  and  are  working  on  extending 
it  to  yet  more  complex  objects.  Good  results  are  obtained,  however,  currently  we 
assume  that  contours  and  symmetries  are  given  to  our  system.  In  separate  projects, 
we  are  investigating  the  computation  of  such  symmetries. 

•  Symmetry  Detection  and  Perceptual  Grouping 

Grouping  of  contours  detected  in  an  image  is  crucial  for  proper  segmentation  and 
description  of  objects  in  a  scene.  In  our  previous  work,  we  found  that  symmetries 
play  a  key  role  in  computing  such  perceptual  groupings  [44,  31].  Symmetries  are  also 
central  to  our  technique  for  inferring  3-D  shape  from  contours.  In  recent  work,  we 
have  been  investigating  efficient  ways  of  computing  these  symmetries  [51].  Once  edge 
contours  are  represented  by  approximating  B-splines  and  the  corners  are  detected  [49], 
the  computation  of  symmetries  is  of  complexity  0{n2),  where  n  is  the  number  of  spline 
segments  as  opposed  to  the  number  of  points. 

•  Matching 

We  have  defined  a  methodology  based  on  efficient  coding  and  hash  tables  to  recognize 
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objects  in  a  cluttered  environment,  even  when  the  number  of  models  is  large.  We  can 
successfully  recognize  flat  objects  under  affine  transform,  and  3-D  objects  given  3-D 
data  (such  as  range  images),  with  no  restrictive  assumptions  on  the  shape  of  these 
objects. 

2.1.1  RANGE  IMAGE  ANALYSIS 

Range  imagery  differs  from  intensity  imagery  in  that  the  input  directly  relates  to  the  ge¬ 
ometric  shape  of  the  objects  in  the  scene.  Our  previous  work  has  allowed  us  to  compute 
symbolic  descriptions  of  range  images,  and  to  perform  matching  with  multi-view  models. 
Recently,  we  have  obtained  integrated  representations  of  models  from  multiple  views,  which 
is  more  natural  since  such  models  can  be  observed  offline  from  many  positions.  The  model 
building  procedure  is  performed  either  by  merging  at  the  data  level  prior  to  segmentation, 
or  by  merging  the  segmented  views,  as  explained  below. 

Range  finders  We  have  two  different  range  finding  systems  available  to  generate  a  range 
map  of  a  given  3-D  object,  both  of  them  based  on  active  triangulation.  The  first  consists  of 
an  independent  laser  system  generating  a  sheet  of  light  projected  on  the  target  object,  which 
is  placed  upon  a  translation  or  a  rotary  table  driven  by  a  personal  computer.  This  computer 
includes  a  video  digitizer  board  with  two  CCD  cameras  looking  at  the  scene  from  both  sides 
of  the  sheet  of  light.  This  is  reported  in  detail  in  [21]  and  in  the  past  annual  report.  This 
low  cost  system  is  accurate  and  can  produce  a  registered  intensity  image  of  the  scene  along 
with  the  range  image 

In  the  case  where  we  can  not  or  do  not  wish  to  move  the  scene  on  a  tray,  we  use  a  system 
that  consists  of  a  nematic  liquid  crystal  mask  inserted  into  a  slide  projector  to  provide  an 
illumination  pattern  and  a  CCD  camera  looking  at  the  scene  from  a  different  angle.  The 
hardware  was  provided  courtesy  of  Prof.  S.  Inokuchi  from  Osaka  University,  and  the  details 
of  the  system  can  be  found  in  [52].  The  mair.  advantage  of  this  system  is  speed,  since  by 
projecting  a  set  of  n  Gray-coded  patterns  onto  the  scene,  we  obtain  depths  for  2n  lines. 

Data  level  merging  One  of  the  difficulties  of  integrating  multiple  views  is  in  finding  an 
accurate  transformation  between  data  obtained  from  different  views.  Previous  research  has 
suggested  determining  the  relative  motion  between  views  by  using  marks  and  regular  patterns 
in  the  scene  by  taking  intensity  images  at  the  same  time  and  matching  those  features  [59], 
or  by  matching  surface  features  directly  [15].  These  techniques  rely  solely  on  the  accuracy 
of  feature  detection  and  provide  no  feedback  from  the  data  themselves  as  to  how  well  the 
different  views  have  been  registered  under  the  estimated  transformation. 

Our  approach  is  to  use  range  data  directly  and  to  register  successive  views  of  the  object 
with  overlapping  areas  to  compute  transformations  for  the  relative  motion  between  views. 
To  reduce  the  possible  large  search  space  and  ensure  that  the  algorithm  converges,  we  assume 
that  the  approximate  transformation  between  the  data  from  two  views  is  known,  which  is 
reasonable  when  the  range  data  are  acquired  in  a  controlled  environment. 

To  register  two  overlapping  range  views  of  the  object,  we  first  choose  a  set  of  surface 
points,  called  control  points,  from  one  range  image,  and  then  apply  a  minimization  process 
to  find  the  rigid  transformation  which  minimizes  a  distance  measure  from  those  control  points 
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(a)  Original  im-  (b)  Reconstructed  (c)  Rendered  image  (d)  Rendered  image 
age  model  of  model  of  model 


Figure  1:  Object  Modeling: 

to  the  surface  represented  by  the  other  range  images.  This  minimization  process  is  done  by 
using  an  iterative  least-square  method.  The  control  points  and  the  distance  measure  have 
been  chosen  so  that  this  process  converges  rapidly. 

To  merge  multiple  views,  we  use  a  cylindrical/spherical  representation  for  simple  compact 
objects.  Successive  range  image  views  of  the  object  are  merged  after  being  mapped  to 
an  object-centered  coordinate  system  by  using  the  relative  transformations  found  by  the 
registration  process.  To  avoid  the  introduction  of  a  cumulative  error  term  in  the  integration 
process,  we  also  use  a  global  registration  strategy,  i.e.,  we  always  register  the  next  view  with 
the  current  integrated  result.  An  example  is  illustrated  in  Figure  1  where  a  the  wood  block 
(a)  has  been  viewed  from  8  side  positions  45°  apart,  from  the  top  and  the  bottom.  The 
reconstructed  views  of  the  object  are  shown  as  shaded  images  in  (c)  and  (d). 

Generating  surface  descriptions  In  order  to  obtain  useful  surface  descriptions,  we  need 
both  to  devise  a  proper  formalism  using  the  criteria  of  richness,  stability  and  local  support , 
and  also  to  design  proper  implementation  tools  to  deal  with  real  images  (noise,  quantization 
and  digitization). 

We  have  chosen  to  segment  range  images  into  simple  surface  patches,  whose  boundaries 
correspond  to  surface  discontinuities  (C0)  or  surface  orientation  discontinuities  (Cj).  Each 
surface  patch  is  then  approximated  globally  by  a  bivariate  quadratic  polynomial  [13].  This 
segmented  representation  of  a  scene  may  be  viewed  as  a  graph  whose  nodes  capture  informa¬ 
tion  about  the  individual  surface  patches  and  whose  links  represent  the  relationships  between 
them,  such  as  occlusion  and  connectivity.  Simple  reasoning  on  these  relationships  is  used 
to  decompose  the  full  graph  into  disjoint  subgraphs  corresponding  to  different  objects.  An 
example  is  shown  in  figure  2a-c. 

The  success  of  this  representation  critically  depends  on  our  ability  to  compute  the  nec¬ 
essary  attributes,  such  as  gradients  and  curvature,  from  an  image  in  the  presence  of  noise. 
We  have  found  adaptive  smoothing  to  be  a  tool  of  great  value  for  such  operations.  The 
details  can  be  found  in  [50,  49],  but  the  ideas  can  be  summarized  as  follows:  The  general 
purpose  of  our  Adaptive  Smoothing  scheme  is  to  smooth  a  signal  -  whether  it  is  an  intensity 


(a)  Original  Scene  (shaded) 


(b)  Inferred  objects 


Figure  2:  Segmentation  of  a  complex  range  image. 

image,  a  range  image  or  a  planar  curve  -  while  preserving  and  even  enhancing  its  discon¬ 
tinuities.  This  is  achieved  by  repeatedly  convolving  the  signal  with  a  very  small  averaging 
filter  modulated  by  a  measure  of  the  signal  discontinuity  at  each  point.  A  relatively  small 
number  of  iterations  is  needed  to  obtain  a  smooth  signal  suitable  for  feature  extraction.  In 
range  images,  we  use  curvature  features  such  as  curvature  extrema  or  zero-crossings  which 
are  easily  detected  and  directly  localized  after  Adaptive  Smoothing  as  opposed  to  Gaussian 
Scale-Space  approaches  where  a  tedious  tracking  procedure  is  needed. 

3-D  Object  Recognition/Symbolic  level  merging  We  have  been  able  to  use  the  above 
descriptions  to  achieve  successful  recognition  of  complex  objects  in  scenes  containing  mul¬ 
tiple  objects  that  are  only  partially  visible  and  are  occluding  each  other.  An  example  of 
recognition  is  presented  in  figure  2(d),  and  a  detailed  treatment  can  be  found  in  [12,  14]. 
For  the  purpose  of  matching,  a  model  is  represented  by  a  set  of  similar  descriptions  from 
multiple  viewing  angles,  typically  4  to  6.  Models  can  therefore  be  acquired  and  represented 
automatically.  Matching  between  objects  in  a  scene  and  models  is  performed  by  three  mod¬ 
ules:  the  screener ,  which  finds  the  most  likely  candidate  views  for  each  object,  the  graph 
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(a)  Intensity 


(b)  Disparity 


(c)  Disparity 


Figure  3:  Renault  Part. 


(d)  Features 


matcher,  which  performs  a  detailed  comparison  between  the  potential  matching  graphs  and 
computes  the  3-D  transformation  between  them,  and  the  analyzer,  which  takes  a  critical 
look  at  the  results  and  proposes  links  to  split  and  merge  object  graphs. 

The  alternative  approach  consists  of  generating  a  symbolic  description,  such  as  an  at¬ 
tributed  graph,  for  each  view,  and  then  merge  the  different  descriptions  at  this  high  level. 
Each  view  is  represented  by  a  graph  whose  nodes  are  the  individual  surface  patches  and 
the  links  are  the  relationships  between  adjacent  patches.  The  matching  between  views  is 
achieved  either  through  a  tree  search  procedure  [14],  or  by  a  2-level  constraint  satisfaction 
network  [40].  One  of  the  difficulties  to  be  overcome  by  this  process  is  the  inference  of  surface 
patches  from  bounding  contours,  since  these  are  not  necessarily  continuous  and  generally  in¬ 
accurate  at  junctions.  We  have  obtained  good  results  by  modeling  this  process  as  a  dynamic 
network  subject  to  weak  smoothness  constraints.  The  initial  state  of  the  network  consists 
of  the  curves  produced  by  low  level  operators,  but  these  decay  over  time  unless  excited. 
Possible  completions  provide  this  excitation,  competing  with  each  other  and  strengthening 
existing  curves  [41]. 

2.1.2  STEREO 

We  are  using  different  approaches  to  solving  the  stereo  correspondence  problem,  including 
using  a  combination  of  area-based  and  feature-based  processing,  and  working  with  complex 
primitives  resulting  from  a  perceptual  grouping  process.  We  also  are  using  active  contours 
to  obtain  accurate  boundaries  of  roof  tops  in  aerial  views  of  urban  areas. 

Feature  and  area-based  processing  We  have  considerably  improved  the  system  de¬ 
scribed  in  earlier  reports  [9],  which  integrates  area-based  and  feature-based  processing,  by 
taking  advantage  of  the  unique  attributes  provided  by  each  one  separately.  The  area-based 
processing  generates  a  dense  disparity  map,  and  the  feature-based  processing  accurately  lo¬ 
cates  discontinuities.  The  first  improvement,  described  in  [10],  is  the  extraction  of  depth 
and,  in  many  cases,  orientation  discontinuities  from  the  image. 

Figure  3  shows  the  results  obtained  for  the  “Renault  Part”  stereo  pair.  Figure  3  (a)  and 
(b)  show  one  of  the  stereo  intensity  images  and  the  respective  disparity  result;  (c)  shows  a 
3-D  plot  of  the  disparity,  from  which  the  surface  features  (d)  were  extracted.  The  surface 
features  located  on  the  disparity  surface  are  the  depth  discontinuities,  the  occluded  regions, 


(a)  Intensity  (b)  Disparity 

Figure  4:  Books  -  Multi- resolution  Pyramid. 


Figure  5:  Books  —  3-D  Plot  of  Disparity. 


and  the  concave  and  convex  folds. 

The  second  improvement  is  the  use  of  a  multi-level  pyramid,  first  processing  a  reduced 
(coarse)  version  of  the  image  pair,  and  then  propagating  the  results  to  another  level  for 
higher-resolution  (finer)  processing,  as  shown  in  figure  4.  This  introduces  a  more  global 
context  and  allows  the  correction  of  local  errors  in  matching,  such  as  those  due  to  photometric 
and  geometric  distortions.  Figure  5  shows  a  3-D  plot  of  the  disparity  and  figure  6  shows 
the  extracted  surface  features.  We  have  applied  this  Stereo  Vision  System  to  a  wide  variety 


(a)  Intensity  Images 


(b)  Disparity  Images 


Figure  7:  Jussieu 

of  scenes  and  obtained  results  which  compare  very  favorably  with  state-of-the-art  methods 

[39,  18,  11]. 

Stereo  of  aerial  urban  scenes  Current  stereo  algorithms,  whether  area-based  or  feature- 
based,  tend  to  fail  around  depth  discontinuities,  since  these  are  the  locations  where  smooth¬ 
ness  assumptions  do  not  hold.  This  phenomenon  is  most  easily  observable  in  aerial  views  of 
urban  scenes,  where  the  roofs  of  buildings  can  be  detected,  but  not  accurately  delineated. 
Fua  [16]  and  Mohan  [34]  propose  to  solve  the  problem  by  restricting  the  possible  shapes  in 
the  form  of  a  generic  model. 

Here  instead,  we  propose  to  use  the  initial  estimate  provided  by  a  traditional  stereo 
system  (as  described  in  the  last  section),  and  to  refine  it  by  enforcing  a  local  smoothness 
constraint.  This  is  accomplished  by  an  active  contour  model,  whose  details  are  given  in  this 
report  [30],  The  estimate  is  shown  in  figures  7-9.  We  have  obtained  excellent  results,  even 
when  the  boundaries  contain  corners,  as  illustrated  on  figure  10. 

Stereo  matching  using  high  level  features  We  are  also  investigating  an  alternative  ap¬ 
proach  to  stereo  that  uses  high  level  features  for  correspondence.  Lower  level  feature  match¬ 
ing  may  have  difficulties  with  global  correspondence,  particularly  when  repetitive  structures 
are  present,  requires  presence  of  rather  dense  texture  and  highly  accurate  knowledge  of  epipo- 
lar  geometry.  High  level  feature  matching  can  potentially  overcome  these  obstacles.  Further 
high  level  features  are  fewer  in  number  and  hence  should  be  faster  to  match.  However, 
this  approach  has  the  deficiency  that  high  level  features  need  to  be  computed  from  monoc¬ 
ular  images;  a  process  that  is  known  to  be  difficult  and  error  prone.  We  have  developed 


Figure  10:  Example  of  delineation  of  buildings  roofs  with  deformable  contour  models 
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(a)  Left  image  (b)  Right  image  (c)  Disparity  output 

Figure  11:  Results  of  a  scene  with  multiple  occlusions 

sophisticated  perceptual  grouping  methods  to  overcome  this  difficulty  [31]. 

Our  first  experience  with  high  level  stereo  was  in  the  context  of  analyzing  buildings  in 
aerial  scenes.  In  such  scenes,  texture  (on  the  roofs)  is  very  sparse  and  disparity  changes 
discontinuously  at  the  boundaries.  We  found  that  using  high  level  features  (rectangles)  was 
very  effective  for  stereo  processing  of  such  scenes  [34].  We  next  investigated  generalization 
of  this  approach  to  scenes  where  the  object  shape  is  not  so  constrained  [34].  In  this  work, 
we  found  that  ribbons  (defined  by  two  symmetrical  curves  with  closures  at  the  two  ends) 
are  an  effective  method  for  organizing  the  curves  in  an  image  into  higher  level  features  and 
that  these  ribbons  could  be  used  for  stereo  matching.  This  work,  however,  concentrated  on 
the  grouping  problem  and  not  on  development  of  a  competent  stereo  system. 

In  our  recent  work,  we  have  been  building  on  our  perceptual  grouping  system  to  develop 
a  stereo  system.  Features  such  as  edgels,  curves,  symmetries,  and  ribbons  which  represent 
geometric  structures  of  objects  in  the  scene  are  extracted  from  each  image  using  perceptual 
grouping.  The  grouping  algorithms  are  similar  to  those  described  in  [34]  but  several  en¬ 
hancements  have  been  incorporated.  A  hierarchy  of  features  from  the  left  and  right  images 
are  matched  using  a  relaxation  network.  Our  method  has  shown  accurate  results  for  images 
with  multiple  occlusions  and  wide  angle  disparities.  Results  from  this  method  are  illustrated 
in  figure  11. 

2.1.3  3-D  SHAPE  FROM  CONTOURS 

Humans  are  able  to  readily  perceive  3-D  shape  from  a  monocular  image.  Many  cues  are 
used  in  this  process  such  as  shading,  shadows  and  texture.  However,  we  believe  that  the 
most  significant  cue  is  the  shape  of  the  2-D  contours.  The  process  of  inferring  3-D  shape 
from  contours,  however,  has  proven  to  be  a  very  difficult  one.  We  believe  that  we  have  made 
a  major  advance  in  this  area  and  have  developed  a  theory  that  significantly  extends  the 
range  of  shapes  that  can  be  analyzed.  Our  theory  relies  on  observations  of  symmetries  in 
the  scene  and  the  conjecture  that  only  shapes  having  certain  symmetries  are  percieved  in 


3-D  by  humans. 

We  define  two  types  of  symmetries  that  we  call  parallel  and  mirror  symmetries  (the 
precise  definitions  are  given  in  another  paper  in  this  report  [58].  Given  the  observations  of 
these  symmetries  in  some  specific  combination,  we  can  infer  some  qualitative  properties  of 
surfaces  and  objects  in  the  scene,  such  as  whether  they  are  planar,  have  a  zero-Gaussian 
curvature  surface,  or  are  some  specific  classes  of  generalized  cylinders. 

Further,  the  contours  and  the  symmetries  allow  us  to  formulate  some  constraints  on  the 
quantitative  shape  of  the  surfaces  being  viewed.  The  constraints  that  derive  purely  from 
the  geometry  of  the  surface  are,  however,  not  sufficient  to  compute  the  precise  shape  of  the 
surface  and  leave  some  degrees  of  freedom  unconstrained.  These  degrees  of  freedom  can  also 
be  fixed  by  using  some  simple  perceptual  properties. 

Our  technique  is  rather  mathematical  and  hence  difficult  to  summarize  without  intro¬ 
ducing  a  good  deal  of  notation.  Hence,  we  will  only  give  references  to  the  more  detailed 
work  and  show  some  examples.  The  basics  of  our  method,  and  its  applications  to  analysis  of 
zero-Gaussian  curvature  surfaces  are  given  in  [57].  Figure  12  shows  some  examples  from  this 
work.  The  first  column  of  this  figure  shows  the  input  contours  to  the  program,  the  middle 
column  shows  the  computed  surface  orientations  as  a  “needle  diagram”  and  the  last  column 
shows  the  surface  orientations  by  painting  the  surface  with  intensities  that  would  result  from 
a  Lambertian  surface  illuminated  by  a  point  source.  Extensions  of  our  method  to  straight 
homogeneous  generalized  cylinders  (SHGCs)  and  snakes  (generalized  cylinders  of  constant 
cross-section)  and  some  results  are  included  later  in  this  report  [58]. 

We  hope  that  these  examples  indicate  the  power  and  range  of  our  approach.  We  are 
in  the  process  of  further  developing  the  theory  to  apply  to  yet  more  complex  objects.  It 
should  be  noted  that  this  technique  assumes  that  the  appropriate  contours  and  symmetries 
are  given;  this  is  far  from  a  trivial  task.  However,  we  are  making  progress  on  detection  of 
the  appropriate  symmetries  in  other  projects  in  our  group  [33,  51]. 

2.1.4  SYMMETRY  DETECTION 

Once  edges  are  extracted,  the  resulting  contours  must  be  represented  for  further  reasoning. 
Iconic  representations  do  not  make  the  necessary  information  explicit:  by  definition  edgels 
only  capture  very  local  properties  of  an  image,  and  the  inference  of  higher  structures,  such 
as  object  boundaries,  requires  grouping  operations.  We  believe  that  such  operations  rely  on 
basic  and  simple  properties  and  various  forms  of  symmetry  [31].  The  representation  must 
therefore  make  explicit  differential  properties  of  contours,  such  as  tangent  and  curvature. 
Furthermore,  because  of  the  variability  inherent  in  the  imaging  process,  the  representation 
should  tolerate  noise,  partial  occlusion,  and  perspective,  thus  suggesting  segmented,  local 
descriptors  [45]. 

If  the  world  was  composed  of  polyhedral  objects  alone,  we  would  know  to  expect  only 
straight  line  segments  in  images,  and  polygonal  approximations  would  be  appropriate.  In 
many  cases,  such  an  approximation  is  indeed  sufficient,  as  demonstrated  by  several  applica¬ 
tions  such  as  stereo  [29],  aerial  image  understanding  [20]  or  object  recognition  [35,  53],  but 
is  unable  to  capture  curvature  information,  since  it  is  a  first  order  approximation.  Also,  if  a 
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(a)  Digital  Curves  (b)  B-spline  Approximation 


(c)  Lines  of  Symmetry  (d)  Symmetry  Axis 


Figure  13:  Detection  of  Elementary  Parallel  Symmetries 

contour  is  smooth,  the  number  of  points  required  to  approximate  it  may  be  quite  large,  and 
the  exact  position  of  the  points  somewhat  unrelated  to  the  contour  itself.  These  issues  have 
been  tackled  by  the  graphics  community  in  the  context  of  design,  and  we  propose  to  use  some 
of  the  resulting  tools,  particularly  approximating  B-splines.  The  resulting  representation  is 
compact  and  faithful  to  the  original  data  for  smooth  or  piecewise  smooth  contours,  open  or 
closed. 

It  is  also  very  well  suited  for  the  detection  of  symmetries.  While  it  is  easy  to  define 
symmetry  between  two  infinite  straight  lines,  the  concept  of  symmetry  between  curves  is 
harder  to  define:  Rosenfeld  [47]  provides  a  lucid  account  of  the  differences  between  Blum’s  [6], 
Brooks’  [7],  and  Brady’s  [5]  definitions,  and  a  more  recent  paper  by  Ponce  [42]  gives  further 
comparisons.  Here,  we  are  interested  not  in  local  symmetries  which  provide  skeletal  shape 
primitives,  but  rather  in  symmetries  which  help  to  infer  shape  from  contour:  Nevatia  and 
Ulupinar  [56]  postulate  that  they  are  skewed  and  parallel. 

These  can  be  computed  efficiently  using  our  B-spline  representation.  The  main  advan¬ 
tages  are  the  low  computational  complexity  (0(n2),  where  n  is  the  number  of  spline  segments 
instead  of  the  number  of  points)  of  the  process  and  the  stability  of  the  results.  Figure  13 
shows  an  example  of  parallel  symmetry  detection  using  a  quadratic  B-spline  approximation 
starting  from  the  two  digital  curves  displayed  in  figure  13(a). 

As  an  application,  for  the  very  specific  case  of  a  torus,  the  detection  of  parallel  symmetries 
allows  us  to  infer  the  3-D  orientation  of  the  object  in  a  much  simpler  fashion  than  proposed 
in  [43],  as  shown  on  figure  14. 
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(d)  Intensity  Image 


(e)  Parallel  Symmetry 


(f)  Positioning 


Figure  14:  Positioning  of  a  Torus 


2.1.5  MATCHING 

Object  recognition  involves  identifying  a  correspondence  between  part  of  an  image  and  a 
particular  view  of  a  known  object.  This  requires  matching  the  image  against  stored  object 
models  to  determine  if  any  of  the  models  could  produce  a  portion  of  the  image.  We  have 
actively  promoted  the  idea  that  higher  level  features  organized  in  graphs  are  the  key  to 
recognition  in  the  presence  of  occlusion  and  photometric  variations  [14,  28,  37).  Recently,  we 
have  addressed  the  issues  involved  in  recognizing  objects  in  a  cluttered  environment  when 
the  number  of  models  is  large.  We  have  been  able  to  show  excellent  results  for  the  recognition 
of  flat  objects  under  affine  transform  [53],  and,  in  a  paper  later  in  this  report,  of  3-D  objects 
given  3-D  data  [54].  The  keys  to  our  approach  are 

•  a  redundant  representation 

•  Gray  code  to  measure  semantic  difference 

•  hash  tables  for  fast  retrieval 

•  automatic  acquisition  of  models 

For  the  problem  of  recognition  of  multiple  flat  objects  in  a  cluttered  environment  from  an 
arbitrary  viewpoint  [53],  the  models  are  acquired  automatically  and  initially  approximated 
by  polygons  with  multiple  line  tolerances  for  robustness.  Groups  of  consecutive  linear  seg¬ 
ments  (super  segments)  are  then  quantized  with  a  Gray  code  and  entered  into  a  hash  table. 
This  provides  the  essential  mechanism  for  indexing  and  fast  retrieval.  Once  the  data  base 
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of  all  models  is  built,  the  recognition  proceeds  by  segmenting  the  scene  into  a  polygonal 
approximation;  the  Gray  code  for  each  super  segment  retrieves  model  hypotheses  from  the 
hash  table.  Hypotheses  are  clustered  if  they  are  mutually  consistent,  and  represent  the  in¬ 
stance  of  a  model.  Finally,  the  estimate  of  the  transformation  is  refined.  This  methodology 
allows  us  to  recognize  models  in  the  presence  of  noise,  occlusion,  scale,  rotation,  translation 
and  weak  perspective.  Unlike  most  of  the  current  systems,  its  complexity  grows  as  O(kN) 
when  N  is  the  number  of  models,  and  k  <C  1.  An  example  of  successful  recognition  is  shown 
in  figure  17  in  the  aerial  image  section  of  this  introduction. 

For  the  recognition  of  3-D  objects  from  3-D  data,  we  use  a  data  structure  called  a 
splash,  which  describes  the  variation  of  surface  normals  in  a  circular  neighborhood  of  a 
point,  encoded  as  a  super  segment.  From  then  on,  the  matching  methodology  is  identical  to 
the  2-D  case.  The  full  details  can  be  found  later  in  this  report  [54]. 

2.2  AERIAL  IMAGE  ANALYSIS 

We  have  three  projects  for  the  analysis  of  images  of  aerial  scenes  including  efforts  to  develop 
modules  that  exhibit  high  performance  by  themselves,  the  integration  of  modules  into  sys¬ 
tems,  and  the  formulation  of  a  theory  to  define  the  underlying  “visual  abilities”  required  and 
useful  for  extraction  of  cultural  features  from  images  of  aerial  scenes: 

•  The  focus  of  our  work  in  the  past  has  been  the  development  of  modules  for  detection 
and  description  of  cultural  (man-made)  features  present  in  aerial  scenes  such  as  the 
transportation  network  (fig.  15a, b)  [19],  building  structures  (fig.  16a, b,c)  [20,  32,  31] 
and  aircraft  (fig.  17a,b,c)  (53].  In  the  past  report  we  gave  a  detailed  example  of  the 
analysis  on  an  airport  complex.  Later  in  this  introductions,  we  will  give  an  example 
of  a  module  for  pier  and  ship  detection  from  an  image  of  a  harbor  complex. 

These  modules  typically  rely  on  the  perceptual  grouping  of  primitive  geometric  features 
(lines,  anti-parallels,  junctions,  portions  of  rectangles,  etc.)  extracted  from  the  images, 
to  detect  the  objects.  Modules  for  mobile  objects  such  as  aircraft  and  ships  on  the 
other  hand,  use  models  and  rely  on  scale  and  rotation  invariant  matching  techniques  to 
detect  the  objects.  Our  current  work  on  2-D  and  3-D  matching  techniques  is  covered 
in  detail  in  [53,  54],  Typically  these  methods  are  applied  at  a  stage  where  we  have 
a  great  deal  of  confidence  that  these  objects  are  (or  should  be)  present  in  the  image. 
For  instance,  after  detection  of  runways,  taxiways,  and  buildings,  we  can  then  look  for 
aircraft  in  the  appropriate  areas.  These  in  turn,  help  reinforce  the  runway  and  taxiway 
hypotheses  as  well  as  help  determine  the  funtionality  of  some  of  the  buildings. 

•  A  second  portion  of  our  work  has  concentrated  on  devising  a  system  that  manages 
the  modules  and  integrates  the  results  of  the  modules  thus  providing  local  and  global 
context  as  well  as  higher  level  reasoning  suitable  for  the  description  of  an  entire  complex 
or  scene.  In  the  past  we  have  concentrated  in  the  domain  of  large  commercial  airports, 
and  developed  modules  for  detecting  major  structures.  Now  we  are  investigating  the 
interaction  of  these  modules.  We  hope  to  report  on  this  work  in  a  future  paper. 
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(a)  JFK  Airport  (b)  JFK  Runways 

Figure  15:  Runway  detection  module 


•  Our  third  project  concentrates  on  the  development  of  general  techiques.  These  include 
devising  a  taxonomy  of  perceptual  grouping  operations  and,  the  development  of  a 
language  for  describing  tasks  in  terms  of  grouping  operations.  We  expand  on  these 
topics  in  the  following  sections. 

2.2.1  DEVELOPMENT  OF  GENERAL  TECHNIQUES 

We  believe  that  a  hierarchy  of  processing  steps  is  the  appropriate  approach  for  aerial  image 
understanding,  where  the  levels  of  the  hierarchy  are  chiefly  determined  by  three  factors: 

1.  The  available  sources  of  knowledge,  both  generic  and  domain  specific.  We  know  for  in¬ 
stance  that  airport  runways  are  straight  (geometry),  and  that  they  must  have  standard 
markings  (object  specific)  applied  to  the  surfaces  for  safety  and  to  aid  pilots. 

2.  The  available  image  resolution  and  quality.  For  example,  it  is  more  desirable  to  look 
for  global  features,  such  as  harbor  piers,  at  lower  resolutions  and  then  apply  the  model- 
to-feature  matching  to  small  portions  of  high  resolution  images  to  locate  the  ships  (see 
below).  Why?  Because  the  pier  areas  are  salient  features,  a  collection  of  macro  features 
arranged  in  some  simple  geometric  fashion  along  the  boundary  of  two  distinct  regions, 
land  and  water.  The  detection  of  ships,  and  perhaps  their  classification  by  type  on  the 
other  hand,  requires  higher  resolution  and  more  symbolic  processing. 

3.  Measurements  and  assertions  as  a  function  of  scale.  What  can  or  should  be  measured 
at  a  given  scale?  Invariably  we  can  get  bogged  down  by  considering  everything  possible 
at  all  scales,  and  build  complex  and  massive  data  structures.  However,  this  is  often 
unreasonable  for  mapping  and  photointerpretation  tasks  where  the  image  content  and 
typical  resolutions  quickly  make  such  approaches  unfeasible. 
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(a)  Simple  groupings  (b)  Complex  groupings 


(c)  Feature/ Area  stereo 


Figure  16:  Building  detection  modules 

The  characterization  of  such  hierarchies  is  the  focus  of  our  work  and  involves  two  funda¬ 
mental  issues:  The  development  of  a  formal  language  to  describe  mapping  and  photointer¬ 
pretation  (or  other)  tasks,  and  the  development  of  a  grouping  theory  to  define  the  generic 
“visual  abilities”  required  to  accomplish  these  tasks.  We  believe  that  many  of  these  visual 
abilities  can  be  expressed  in  terms  of  generalized  classes  of  perceptual  grouping  operations 
that  can  be  applied  in  parallel.  Eventually  the  task  descriptions  should  be  given  in  terms 
of  (or,  compiled  into)  a  sequence  of  alternating  abstractions  in  the  representation  of  the 
features  and  application  of  classes  of  grouping  operations.  Wc  explore  some  of  these  ideas 
below  using  as  an  example  the  task  to  “detect  pier  areas  and  ships”  from  an  aerial  image  of 
a  portion  of  a  harbor  scene. 

Most  of  this  work  would  fit  at  the  “middle-level”  level  of  perception.  The  “connection” 
with  the  lower  levels  of  processing,  is  reflected  by  the  fact  that  the  grouping  processes  are 
more  “non-purposive,”  and  thus  should  be  implemented  to  run  in  parallel.  The  connection 
to  higher  levels  of  processing  (reasoning  about  segmented  objects,  where  an  object  is  a 
single,  functionally  identifiable  3-D  object,  as  determined  by  the  task  at  hand)  is  reflected 
by  grouping  processes  that  are  more  purposive,  operate  on  increasingly  abstract  features, 
and  are  sequential  in  nature. 

For  a  number  of  years  our  group  has  developed  methods  and  techniques  involving  per¬ 
ceptual  organization.  Groupings  of  near,  parallel,  collinear,  co-curvilinear,  and  symmetric 
features  have  been  used  to  represent,  segment  and  extract  parts  or  whole  objects  from  aerial 
images  and  images  of  office  scenes.  For  a  reference  on  our  most  recent  work  see  [3 1  j . 
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(a)  LAX  Airport 

* 


(b)  Canny  Edges 

& 


(c)  Detected  Aircraft 

Figure  17:  2-D  Matcher  applied  to  image  edges  for  aircraft  detection 


Recently  we  have  begun  work  towards  the  development  of  a  taxonomy  for  grouping 
operations,  and  here  we  only  introduce  informally  the  notion  of  grouping  fields ,  a  general 
tool  for  describing  mathematically  the  visual  abilities  that  involve  perceptual  groupings  of 
visual  primitives  closer  to  the  lower  and  middle  levels  of  perception.  These  are  analogous  to 
the  ability  that  humans  have  to,  presumably  preattentively,  acquire  sensations  that  capture 
fundamental  and  basic  geometric  arrangements  of  image  elements  in  a  reflexive  manner. 

Briefly,  the  notion  of  a  grouping  field  is  analogous  to  force  fields  in  nature.  When  a  visual 
feature,  due  to  its  size,  shape,  or  other  property  induces  a  perceptual  grouping  with  other 
features  in  the  field  of  view,  we  say  that  a  grouping  field  exists  around  it.  Conversely,  any 
visual  feature  in  the  field  of  view  generates  a  grouping  field  which  is  a  function  of  the  feature 
properties  and  can  be  influenced  by  the  task  at  hand. 

We  believe  that  grouping  fields  will  be  useful  in  dealing  with  many  of  the  problems 
pointed  out  in  previous  work  by  (25,  26,  27,  55,  60]  and  others,  that  attempted  to  derive 
computational  approaches  to  perceptual  organization  abilities. 

The  combinatorial  explosions  that  arise  in  attempting  to  establish  relationships  among 
low  level  features  purely  on  the  basis  of  attribute  processing  is  a  major  problem.  For  pho¬ 
tointerpretation  tasks,  at  least,  it  seems  that  the  way  to  avoid  this  is  to  explore  the  generality 


aspects  top-down,  that  is,  by  describing  what  we  want,  say  detect  piers  and  ships,  and  with 
our  own  experienced  knowledge  of  piers  and  ships,  generate  a  task  description  that  includes 
the  perception  landmarks  (first,  detect  border  between  land  and  water  region,  then  detect 
pier  areas,  next  detect  ships  in  the  neighborhood  of  pier  areas,  last  identify  ships). 

2.2.2  AN  EXAMPLE:  ANALYSIS  OF  HARBOR  COMPLEXES 

In  analyzing  a  harbor  complex  we  want  to  be  able  to  describe  the  buildings  in  the  port 
facility,  the  transportation  network  around  the  facilities,  and  of  course  the  pier  areas  and 
the  ships  in  the  area.  In  our  example  we  concentrate  on  the  piers  and  ships  and  the  grouping 
fields  and  grouping  operations  that  lead  to  the  detection  of  the  pier  areas.  We  then  briefly 
discuss  ship  detection  and  classification. 

What  do  we  need  to  know  about  port  and  harbor  facilities  to  detect  the  piers  and  de¬ 
scribe  the  ships?  That  the  planning  and  design  of  port  and  harbor  facilities  is  strongly 
dependent  on  the  characteristics  of  the  ships  to  be  served  and  the  type  of  cargo  to  be  han¬ 
dled  [61].  To  eventually  describe  the  scene  completely  we  would  need  information  about  the 
ships:  Main  dimensions  (length,  beam,  draft),  cargo- carrying  capacity,  cargo-handling  gear, 
types  of  cargo  units,  shape,  hull  strength  and  motion  characteristics,  mooring  equipment, 
maneuverability,  and  so  on. 

To  detect  only  the  pier  areas  (where  later  we  look  for  ships)  we  only  need  the  upper 
bounds  on  ship  dimensions  and  the  image  resolution.  These  parameters  are  easily  available 
a-priori  and  chiefly  determine  the  extent  and  strength  of  the  grouping  fields  associated  with 
the  features.  Let  us  define  some  grouping  classes  useful  for  this  task: 

•  Proximity-OD  (PxOD):  Groups  nearby  features  without  regard  for  the  dimensions  of 
the  features.  Each  feature,  whether  a  dot,  a  line,  a  ship,  or  another  suitable  group, 
generates  a  grouping  field  about  its  center  of  mass.  The  extent  of  the  field  (typically, 
circularly  symetric)  is  determined  by  the  field  of  view  or  by  the  task  as  a  function  of 
image  resolution.  Intersecting  fields  form  a  group  with  the  same  extent  and  a  new 
center  of  mass.  The  strength  of  the  field  is  proportional  to  the  “mass”  (a  function  of 
the  complexity  of  the  feature),  and  inversely  proportional  to  the  square  of  the  distance 
from  the  feature’s  center  of  mass.  Values  are  scaled  according  to  resolution  so  that  the 
same  two  features  at  two  different  resolutions  attract  each  other  with  the  same  force. 

•  Proximity-ID  (PxlD):  Groups  nearby  features  where  a  ID  attribute  is  dominant  and 
can  be  used  to  constrain  membership.  The  strength  of  the  field  in  this  case  would  be 
proportional  to,  and  a  function  of  the  attribute. 

•  Proximity-ND  (PxND):  Groups  nearby  features  with  ND  attributes.  Each  attribute 
requires  one  layer. 

•  Parallelism  with  overlap  (PlwO):  Groups  features  that  are  parallel  with  respect  to  their 
dominant  orientations.  Each  allowed  orientation  determines  a  layer  where  the  fields 
of  each  feature  having  that  orientation  is  active.  For  each  orientation,  intersecting 
fields  give  all  the  features  parallel  to  a  given  feature.  The  fields  themselves  have 
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(a)  Image 


(b)  Segments  and  Apars 


Figure  18:  U.S.  Navy  Facility  (512x512  image) 


an  elliptical  shape  with  its  minor  axis  equivalent  to  the  length  of  the  feature  in  the 
dominant  orientation,  and  its  major  axis  equivalent  to  the  extent  of  the  field  of  view 
or,  constrained  by  the  task.  Note  that  allowing  for  angle  tolerances  is  equivalent  to 
the  union  of  fields  across  field  layers. 

•  Parallelism  with  no  overlap  (PlwnO):  The  same  as  above  with  circularly  symmetric 
fields. 

•  Collinearity-OD  (CoOD):  Groups  three  or  more  features  without  regard  for  the  spatial 
extent  of  the  feature.  Any  two  of  the  three  features  determine  the  extent  of  the  grouping 
field,  typically  an  ellipse  with  high  eccentricity,  centered  about  the  center  of  mass  of 
the  feature.  The  eccentricity  determines  the  allowed  tolerance  in  collinearity,  and  the 
extent  of  the  field  is  equivalent  to  the  extent  of  the  field  of  view.  The  orientation  of 
the  two  selected  features  determines  a  layer  for  field  intersection.  The  steps  in  a  ladder 
have  CoOD. 

•  Collinearity-ID  (ColD):  Groups  two  or  more  features  with  respect  to  their  dominant 
orientation.  Each  feature  determines  the  extent  its  GF,  also  an  ellipse  with  high 
eccentricity.  The  eccentricity  determines  the  allowed  tolerance  in  collinearity,  and  the 
extent  of  the  field  is  equivalent  to  the  extent  of  the  field  of  view,  or  constrained  by  the 
task  at  hand.  The  orientation  of  each  feature  determines  a  layer  for  field  intersection. 
The  fragments  of  an  airport  runway  have  ColD. 

Let  us  now  apply  two  of  these  definitions  to  our  pier  example.  Figure  18(a)  shows  an 
image  of  a  portion  of  the  U.S.  Navy  facilities  in  San  Diego.  We  expect  to  see  mostly  military 
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ships  that  may  require  long  term  docking,  thus  allowing  for  double  or  triple  docking.  We 
know  the  image  resolution  and  the  approximate  ship  dimensions,  thus  we  know  the  minimum 
size  of  the  piers.  The  following  gives  the  levels  of  the  desired  task: 

0:  Analyze  Harbor  Scene. 

1:  Detect  and  classify  buildings. 

1:  Detect  and  classify  access  roads. 

1:  Detect  and  classify  ships. 

2:  Detect  ships. 

2:  Locate  ship  repair/construction  areas. 

3 :  Locate  ships . 

4:  Classify  ships. 

2:  Locate  Pier  areas. 

3:  Locate  boundary  between  land  and  water. 

3:  Locate  '‘land’’  structures  in  water. 

3:  Detect  pier  areas. 

3:  Locate  ships. 

4:  Classify  ships. 

3 :  Describe  ships 

2:  Describe  piers. 

1:  Describe  piers  and  ships  by  class. 

0:  Describe  harbor  scene. 

We  now  describe  the  task  at  level  2,  Locate  Pier  Areas: 

Locate  Boundary  between  Land  and  Water:  We  detect  the  boundary  between  land 
and  water  regions  automatically  using  a  region-based  segmentation  procedure  [38].  In  this 
example  we  arbitrarily  selected  the  largest  region  to  represent  the  water  region.  Next  we 
approximate  these  boundary  by  piecewise  linear  segments  (thick  lines  in  fig.  18(b))  using 
LINEAR  [36]. 

Locate  “land”  Structures  in  Water:  Contrary  to  many  natural  structures  on  the 
shores,  man-made  structures  appear  highly  geometric.  We  expect  that  most  piers  appear  as 
linear  structures  attached  to  the  shore,  and  surrounded  by  water.  Their  linearity  indicates 
that  the  piers  or  portions  of  piers  should  be  characterized  by  anti-parallel  pairs  of  segments 
of  opposing  contrast  [36],  or  apars  for  short.  Ships  are  typically  docked  parallel  and  adjacent 
to  the  piers.  We  then  expect  that  most  of  the  line  segments  corresponding  to  sides  of  piers, 
sides  of  ships,  shadows,  and  so  on  in  the  neighborhood  of  the  piers  would  result  in  many 
apars.  The  constraint  on  the  range  of  separations  between  pair  of  segments  (equivalent  to 
the  width  of  the  resulting  apar)  is  a  function  of  image  resolution  and  ship  dimensions.  The 
apars  in  our  example  are  shown  as  thin  lines  in  fig.  18(b)  obtained  using  LINEAR. 

Detect  Pier  Areas:  The  apars  are  easily  classified  into  land  or  water  using  the  detected 
water  region.  Subsequent  processing  operates  on  ‘he  land  apars  only.  Next,  we  apply  PxOD 
grouping.  The  extent  of  the  fields  is  task-dependent  and  does  not  have  to  be  precisely 
determined.  At  the  resolution  in  our  example  (about  8  meters  per  pixel),  the  field’s  radii  is 
roughly  equivalent  to  a  pier  width  plus  the  width  of  three  destroyers  on  both  sides  of  the 
piers,  or  about  16  pixels. 
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(a)  PxOD  -  Proximity  grouping 


(b)  Selected  proximity  groups 
Figure  19:  Selected  proximity  groups 

These  fields  (fig.  19(a))  occupy  a  single  layer.  Each  field  intersection  operation  shifts 
the  center  of  mass  of  the  group,  however  the  field  associated  with  the  group  has  the  same 
properties  as  the  individual  apar  fields.  We  then  select  the  groups  so  that  apar  membership 
is  exclusive  by  extracting  the  groups  in  order  of  decreasing  mass  (number  of  apars).  The 
resulting  groups  (fields)  represent  potential  pier  fragments  (fig.  19(b).) 

At  any  resolution,  we  expect  that  the  lines  are  fragmented  and  incomplete,  due  to  in¬ 
efficiency  in  the  line  detection  process  or  real  structures  in  the  image.  Thus,  we  expect 
that  the  resulting  groups  represent  pier  fragments  rather  than  complete  pier  areas.  Since  we 
expect  the  pier  sections  to  be  straight,  the  next  step  calls  for  collinearity  grouping  to  join 
possible  fragmented  pier  areas.  Note  that  the  groups  in  fig.  19(b)  are  easily  perceived  as 
being  collinear. 

We  choose  to  represent  the  groups  of  apars  by  apars  as  well,  having  a  length  and  width 
equal  to  the  diameter  of  the  final  field.  The  orientation  of  the  apar  is  given  by  the  dominant 
orientation  (the  largest  peak  in  the  length-weighted  histogram  of  the  orientation)  of  the 
apars  in  the  group  (see  the  arrows  in  fig  20(a).) 

Next  we  apply  ColD  to  the  pier  area  fragments.  The  longest  piers  are  about  three  times 
the  length  of  a  destroyer  thus  we  allow  the  extent  of  the  elliptic  fields  (see  fig.  20(a))  to 
be  up  to  three  times  the  apars,  and  have  a  width  equivalent  to  the  apar  width  (or  group 
radius). 

The  result  of  the  grouping  is  then  represented,  again  by  apars,  which  in  turn  represent 
potential  pier  areas  (see  fig.  20(b)).  These  are  described  by  their  approximate  length  and 
position,  and  are  used  to  extract  image  windows  from  a  high  resolution  image  of  the  scene 
where  we  look  for  ships. 


(a)  Potential  Fragments  and  ColD  fields  (b)  Detected  Pier  Area 

Figure  20:  Pier  Detection  Fragments  and  Low  Resolution  Results 


Locate  ships:  We  have  performed  preliminary  experiments  to  detect  the  ships  in  high 
resolution  windows  using  the  same  matching  technique  that  we  used  to  detect  aircraft  [54]. 
One  of  these  windows  is  shown  in  figure  21a  with  the  adaptively  smoothed  [8]  boundaries 
shown  in  figure  21b.  Three  coarse- to- fine  models  of  a  single  and  a  double  destroyer  group 
were  matched  against  these  edges  to  obtain  the  detected  ships  in  figures  21c, d. 

Classify  ships:  We  consider  our  ship  detection  results  preliminary.  The  simplicity  of  the 
ship’s  shape  is  a  disadvantage  for  the  matching  technique.  The  double  ship  configurations 
are  easier  to  match  for  the  same  reason.  For  ship  identification  better  ship  boundaries  are 
required.  We  plan  to  apply  a  technique  for  boundary  refinement  using  B-snakes  [30].  The 
matching  technique  can  then  be  applied  with  finer  models  for  more  accurate  ship  classifica¬ 
tion.  Other  alternatives  for  ship  detection  include  stereo  processing  of  these  high  resolution 
windows  with  an  area/feature  based  technique  [9],  also  followed  by  boundary  refinement  and 
2D  matching. 

2.3  PARALLEL  PROCESSING 

As  shown  in  the  previous  sections,  we  are  making  good  progress  in  solving  some  difficult  im¬ 
age  understanding  problems.  However,  one  major  obstacle  remains  in  applying  our  methods 
in  practice,  namely  that  of  processing  speed.  Our  algorithms,  when  run  on  a  conventional 
serial  computer  (such  as  a  Symbolics  3600  or  a  Sun  3  or  4  series)  can  take  several  minutes 
or  even  hours  to  complete.  We  believe  that  this  long  execution  time  and  its  related  compu¬ 
tational  complexity  are  inherent  in  the  solution  to  the  problems  and  hence  we  must  devise 
ways  of  applying  additional  computing  power  to  our  algorithms.  This  naturally  leads  to  the 
study  of  parallel  computation. 


(a)  Ships  160x160  image  win¬ 
dow 


(b)  Canny  edges 


(c)  Single  ship  match 


(d)  Double  ship  match 


Figure  21:  Fast  2D  model-based  matcher  applied  to  edges  of  ships 


There  has  been  significant  recent  activity  in  applying  parallel  processing  to  image  un¬ 
derstanding  problems.  However,  much  of  this  activity  focuses  on  numerical  computations 
applied  to  iconic  data  structures.  While  such  computations  are  necessary  and  useful,  they 
are  not  nearly  sufficient.  Our  approach  to  image  understanding  is  firmly  based  on  use  of 
symbolic  representations  and  symbolic  computations.  Parallelizing  such  computations  is  sig¬ 
nificantly  more  complex  than  for  iconic,  numerical  computations  and  therefore,  is  the  focus 
of  our  parallel  processing  research. 

Our  work  has  included  both  the  implementation  of  known  computer  vision  algorithms 
on  a  parallel  machine  (The  Connection  Machine  [17],  which  is  a  Single  Instruction  Multiple 
Data  (SIMD)  machine  having  between  16k  and  64k  processors),  and  the  analysis  of  general 
techniques  for  implementing  image  understanding  algorithms  on  parallel  architectures. 

Several  algorithms  have  been  implemented  to  evaluate  the  capabilities  of  the  parallel 
system.  The  first  is  Adaptive  Smoothing,  which  is  an  edge  preserving  image  smoothing 
algorithm  in  which  we  iteratively  convolve  the  image  with  a  mask  whose  coefficients  reflect 


the  degree  of  continuity  of  the  underlying  image  surface  [50,  49].  With  a  Vax  front  end  and 
the  parallel  Lisp  implementation,  each  iteration  takes  about  50  msecs  on  a  256  x  256  x  8 bit 
image.  In  order  to  compare  the  performance  to  the  algorithm  on  a  serial  machine,  we 
implemented  the  Adaptive  Smoothing  on  a  Symbolics  3645,  where  it  takes  about  40  seconds 
for  each  iteration.  Thus  the  speedup  we  get  from  the  Connection  Machine  over  the  serial 
implementation  is  about  three  orders  of  magnitude.  Using  the  adaptive  smoothing  system  in 
a  multiple  scale  stereo  matching  system  based  on  Drumheller  and  Poggio  [11]  greatly  reduces 
the  number  of  possible  matches  at  each  scale  and  obtains  a  dense  disparity  map  at  fine  scale. 

In  our  work  on  parallel  techniques  for  image  understanding  we  have  studied  several  stor¬ 
age  and  data  access  problems  arising  in  mapping  image  algorithms  onto  parallel  machines, 
parallel  implementations  of  techniques  developed  by  our  group  on  hypercube  and  mesh  based 
architectures,  and  continued  our  efforts  in  parallel  computations  on  reconfigurable  VLSI  ar¬ 
rays  and  reduced  meshes[l,  2].  (This  work  has  been  partially  supported  by  AFOSR  under 
grant  AFOSR-89-0032.)  We  have  also  studied  memory  access  systems  that  achieve  constant 
time  access  to  rows,  columns,  diagonals  and  subarrays  using  a  minimum  number  of  memory 
modules  [22], 

We  have  chosen  some  specific  and  representative  medium  and  high  level  image  under¬ 
standing  algorithms  that  we  have  found  to  be  of  general  utility  and  are  studying  their 
mapping  onto  suitable  parallel  architectures.  Our  goal  is  not  only  to  map  these  specific 
algorithms,  but  also  to  learn  how  to  parallelize  classes  of  symbolic  algorithms.  One  specific 
algorithm  we  have  focused  on  is  a  “relaxation  labelling”  algorithm  [28].  We  have  found  this 
algorithm  to  be  useful  in  a  variety  of  tasks  in  our  work  at  USC;  relaxation  labelling  has  also 
been  used  by  many  other  researchers  elsewhere  [48]. 

We  have  obtained  several  efficient  parallel  implementations  of  discrete  relaxation  tech¬ 
niques  on  a  class  of  parallel  architectures  [24].  Using  these  approaches,  stereo  matching  and 
other  labeling  problems  can  be  solved.  First,  a  faster  sequential  algorithm  compared  to  tra¬ 
ditional  approaches  for  discrete  relaxation  is  developed.  This  algorithm  is  then  parallelized 
and  mapped  onto  a  bus-connected  parallel  architecture.  This  mapping  leads  to  a  parallel 
execution  time  of  0(nm)  using  nm  processors  for  consistently  labeling  n  objects  with  m 
labels.  Two  versions  of  this  design  are  developed;  one  for  special-purpose  VLSI  implementa¬ 
tion  and  the  other  for  general-purpose  parallel  architectures.  The  stereo  matching  technique 
developed  in  [28]  can  then  be  modified  to  lead  to  an  efficient  parallel  implementation  based 
on  the  proposed  solution. 

The  usual  approach  to  parallel  processing  is  to  choose  a  specific  architecture  (based  on 
considerations  of  availability  as  well  as  suitability)  and  then  attempt  to  map  the  given  algo¬ 
rithm  onto  it.  This  often  leads  to  complex  implementations  that  are  difficult  to  understand 
and  put  a  severe  burden  on  the  programmer.  In  recent  work,  we  are  taking  an  alternative 
approach  of  using  a  flexible  architecture  where  the  architecture  can  be  modified  to  suit  the 
data  flow  requirements  of  the  algorithm.  Flexible  architectures  are  becoming  feasible  design 
solutions  as  commercial  processing  elements  that  support  parallel  processing,  such  as  the 
Transputer,  are  becoming  available.  Efficient  parallel  implementation  can  be  achieved  while 
maintaining  the  structure  of  the  program  much  as  it  is  for  the  serial  implementation.  That 
is,  parallel  efficiency  can  be  obtained  while  maintaining  algorithm  simplicity  and  keeping 
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the  programmer  burden  low.  We  have  succeeded  in  demonstrating  this  approach  for  the 
relaxation  algorithm;  this  work  is  described  more  fully  in  [46].  In  future  work,  we  intend  to 
examine  more  complex  algorithms  and  complete  systems  with  this  approach. 

In  another  project,  we  are  studying  processor-time  tradeoffs.  These  are  of  fundamen¬ 
tal  importance  in  understanding  the  complexity  and  performance  of  parallel  computations. 
Driven  by  technological  limitations,  hardware  cost,  and  flexibility,  several  schemes  have  been 
proposed  for  implementing  large  size  computations  on  parallel  architectures  of  fixed-size,  or 
on  architectures  having  a  reduced  number  of  processors.  The  major  goal  of  such  schemes 
is  to  keep  the  number  of  processors  (or  the  processing  chip-area,  if  implemented  in  VLSI) 
independent  of  the  problem  size  and  subject  only  to  hardware  cost,  and  other  practical 
considerations.  Such  considerations  are  particularly  important  for  problems  on  digitized 
images.  With  increasing  image  resolution,  a  processor  array  for  a  1024  x  1024  image  with 
a  fixed  number  of  pixels,  say  8,  per  processor  requires  more  than  105  processors.  Design 
and  implementation  of  such  large  arrays  may  be  prohibitive,  in  addition  to  dealing  with  I/O 
limitations,  programming  and  testing  methodologies.  Furthermore,  if  this  array  is  required 
to  handle  larger  size  images,  say  images  of  size  2048  x  2048,  then  processor-time  trade-offs 
must  be  addressed  again. 

Direct  mapping  of  parallel  techniques  from  a  specific  organization  onto  a  smaller  version 
of  the  same  organization  generally  does  not  lead  to  linear  processor-time  trade-off.  New 
techniques  based  on  combining  efficient  parallel  and  sequential  algorithms  must  be  devel¬ 
oped.  We  have  considered  several  parallel  architectures  with  a  large  memory  and  a  reduced 
number  of  processors  for  parallel  image  computations  [3,  4].  The  memory  size  is  propor¬ 
tional  to  the  image  size.  However,  the  number  of  processors  can  be  varied  over  a  wide  range 
while  maintaining  processor- time  optimal  performance.  Architectures  considered  include 
the  reduced  mesh  of  trees  (RMOT),  mesh-connected  modules  (MCM),  linear  arrays,  two  di¬ 
mensional  meshes,  hypercubes ,  and  shuffle- exchange  networks.  An  alternate  cost-effective 
parallel  architecture,  designated  window  architecture,  is  proposed  for  image  understanding 
applications  [23].  This  architecture  consists  of  a  small  number  of  processors  with  mesh  con¬ 
nections  and  a  large  external  memory  with  simple  processor-memory  access  scheme.  Parallel 
solutions  for  several  image  understanding  problems,  such  as  image  labeling,  computing  im¬ 
age  transforms,  computing  geometric  properties,  image  and  stereo  matching  using  high  level 
primitives  such  as  line  segments,  have  been  derived  on  this  architecture  [23]. 


References 

[1]  H.  Alnuweiri  and  V.  K.  Prasanna-Kuinar.  Fast  image  labelling  using  local  operators  on 
mesh  connected  computers.  In  Proceedings  of  the  International  Conference  on  Parallel 
Processing ,  1989. 

[2]  H.  Alnuweiri  and  V.  K.  Prasanna-Kuinar.  Optimal  image  computations  on  reduced 
VLSI  arrays.  IEEE  Transactions  on  Circuits  and  Systems,  1989. 

[3]  H.  Alnuweiri  and  V.K.  Prasanna  Kumar.  Optimal  image  algorithms  on  an  orthogonally 
connected  memory  architecture.  In  Proceedings  of  the  International  Conference  on 


26 


Pattern  Recognition ,  pages  350-355,  Atlantic  City,  New  Jersey,  June  1990. 

[4]  H.  Alnuweiri  and  V.K.  Prasanna  Kumar.  Optimal  image  computations  on  reduced 
processor  parallel  architectures.  Parallel  Architectures  and  Algorithms  for  Image  Un¬ 
derstanding 1990.  Academic  Press. 

[5]  H.  Asada  and  M.  Brady.  The  curvature  primal  sketch.  In  Proceedings  of  the  IEEE 
Workshop  on  Computer  Vision:  Representation  and  Control,  pages  8-17,  Annapolis, 
Maryland,  May  1984. 

[6]  H.  Blum.  A  Transformation  for  Extracting  New  Descriptors  of  Shape.  MIT  Press, 
Cambridge,  MA,  1967. 

[7]  R.  A.  Brooks.  Symbolic  reasoning  among  3-D  models  and  2-D  images.  Artificial  Intel¬ 
ligence,  17:285-349,  1981. 

[8]  J.-S.  Chen.  Accurate  Edge  Detection  for  Multiscale  Processing.  PhD  thesis,  University 
of  Southern  California,  1989. 

[9]  S.  D.  Cochran  and  G.  Medioni.  Accurate  surface  description  from  binocular  stereo.  In 
Proceedings  of  the  DARPA  Image  Understanding  Workshop,  Palo  Alto,  California,  May 
1989. 

[10]  S.  D.  Cochran  and  G.  Medioni.  Accurate  surface  description  from  binocular  stereo. 
In  Proceedings  of  the  Workshop  in  Interpretation  of  3D  Scenes,  pages  16-23,  Austin, 
Texas,  November  27-29  1989. 

[11]  M.  Drumheller  and  T.  Poggio.  On  parallel  stereo.  In  Proceedings  of  the  IEEE  Conference 
on  Robotics  and  Automation,  pages  1439-1448,  San  Francisco,  California,  April  1986. 

[12]  T.-J.  Fan.  Describing  and  Recognizing  3-D  Objects  Using  Surface  Properties.  PhD 
thesis,  University  of  Southern  California,  August  1988.  Technical  Report  IRIS-237. 

[13]  T.-J.  Fan,  G.  Medioni,  and  R.  Nevatia.  Segmented  descriptions  of  3-D  surfaces.  IEEE 
Journal  of  Robotics  and  Automation,  RA-3(6):527-538,  December  1987. 

[14]  T.-J.  Fan,  G.  Medioni,  and  R.  Nevatia.  Recognizing  3-d  objects  using  surface  descrip¬ 
tions.  IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence,  1 1(  1 1 ):  1 140— 
1157,  November  1989. 

[15]  F.P.  Ferrie  and  M.D.  Levine.  Integrating  information  from  multiple  views.  In  Proceedings 
of  the  IEEE  Workshop  on  Computer  Vision,  pages  117-122,  December  1987. 

[16]  P.  Fua  and  Y.  G.  Leclerc.  Model  driven  edge  detection.  In  Proceedings  of  the  DARPA  Im¬ 
age  Understanding  Workshop,  volume  2,  pages  1016-1021,  Cambridge,  Massachusetts, 
April  1988. 

[17]  D.  Hillis.  The  Connection  Machine.  MIT  Press,  Cambridge,  Massachusetts,  1985. 


27 


[18]  W.  Hoff  and  N.  Ahuja.  Surfaces  from  stereo:  Integrating  feature  matching,  disparity 
estimation,  and  contour  detection.  IEEE  Transactions  on  Pattern  Analysis  and  Machine 
Intelligence,  11(2):121-136,  February  1989. 

[19]  A.  Huertas,  W.  Cole,  and  R.  Nevatia.  Detecting  runways  in  complex  airport  scenes. 
Computer  Vision,  Graphics,  and  Image  Processing,  August  1990. 

[20]  A.  Huertas  and  R.  Nevatia.  Detecting  buildings  in  aerial  images.  Computer  Vision, 
Graphics,  and  Image  Processing,  41(2):131— 152,  February  1988. 

[21]  J.  L.  Jezouin,  P.  Saint-Marc,  and  G.  Medioni.  Building  an  accurate  range  finder  with 
off  the  shelf  components.  In  Proceedings  of  the  Conference  on  Computer  Vision  and 
Pattern  Recognition,  pages  195-202,  Ann  Arbor,  Michigan,  June  1988. 

[22]  K.  Kim  and  V.  K.  Prasanna-Kumar.  Parallel  memory  systems  for  image  processing.  In 
Proceedings  of  the  Conference  on  Computer  Vision  and  Pattern  Recognition ,  San  Diego, 
California,  June  1989. 

[23]  W.-M.  Lin.  Mapping  image  algorithms  onto  fixed  size  window  architecture.  USC  Thesis 
in  preparation,  June  1990. 

[24]  W.-M.  Lin  and  V.K.  Prasanna  Kumar.  Parallel  architectures  for  discrete  relaxation 
algorithms.  Technical  report,  University  of  Southern  California,  June  1990. 

[25]  D.  Lowe.  Perceptual  Organization  and  Visual  Recognition.  Kluwer  Academic  Press, 
1985. 

[26]  D.  Lowe  and  T.  Binford.  Segmentation  and  aggregation:  An  approach  to  figure-ground 
phenomena.  In  Proceedings  of  the  DARPA  Image  Understanding  Workshop,  Stanford, 
California,  September  1982. 

[27]  D.  Lowe  and  T.  Binford.  The  perceptual  organization  of  visual  images:  Segmentation  as 
a  basis  for  recognition.  In  American  Association  for  Artificial  Intelligence,  Washington, 
D.C.,  August  1983. 

[28]  G.  Medioni  and  R.  Nevatia.  Matching  images  using  linear  features.  IEEE  Transactions 
on  Pattern  Analysis  and  Machine  Intelligence,  PAMI-6(6):675-685,  November  1984. 

[29]  G.  Medioni  and  R.  Nevatia.  Segment-based  stereo  matching.  Computer  Graphics  and 
Image  Processing,  31  ( 1  ):2— 18,  July  1985. 

[30]  S.  Menet,  P.  Saint-Marc,  and  G.  Medioni.  B-snakes:  Implementation  and  application 
to  stereo.  In  Proceedings  of  the  DARPA  Image  Understanding  Workshop,  Pittsburgh, 
Pennsylvania,  September  1990. 

[31]  R.  Mohan.  Perceptual  Organization  for  Computer  Vision.  PhD  thesis,  University  of 
Southern  California,  August  1989.  IRIS  Technical  Report  254. 


28 


[32]  R.  Mohan  and  R.  Nevatia.  Perceptual  grouping  for  the  detection  and  description  of 
structures  in  aerial  images.  In  Proceedings  of  the  DARPA  Image  Understanding  Work¬ 
shop ,  pages  512-526,  Boston,  Massachusetts,  April  1988. 

[33]  R.  Mohan  and  R.  Nevatia.  Segmentation  and  description  based  on  perceptual  organi¬ 
zation.  In  Proceedings  of  the  Conference  on  Computer  Vision  and  Pattern  Recognition , 
pages  333-341,  San  Diego,  California,  June  1989. 

[34]  R.  Mohan  and  R.  Nevatia.  Using  perceptual  organization  to  extract  3-d  structures. 
IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence ,  1 1(1 1):  1 121—1139, 
November  1989. 

[35]  J.  Mundy,  A.  Heller,  and  D.  Thompson.  The  concept  of  an  effective  viewpoint.  In 
Proceedings  of  the  DARPA  Image  Understanding  Workshop,  pages  651-659,  Cambridge, 
MA,  1988. 

[36]  R.  Nevatia  and  K.  R.  Babu.  Linear  feature  extraction  and  description.  Computer 
Graphics  and  Image  Processing ,  13:257-269,  1980. 

[37]  R.  Nevatia  and  K.  Price.  Locating  structures  in  aerial  images.  IEEE  Transactions  on 
Pattern  Analysis  and  Machine  Intelligence,  PAMI-4(5):476-484,  September  1982. 

[38]  R.  Ohlander,  K.  Price,  and  R.  Reddy.  Picture  segmentation  by  a  recursive  region 
splitting  method.  Computer  Graphics  and  Image  Processing,  8:313-333,  1978. 

[39]  S.  I.  Olsen.  Stereo  correspondence  by  surface  reconstruction.  IEEE  Transactions  on 
Pattern  Analysis  and  Machine  Intelligence ,  12(3):309-315,  March  1990. 

[40]  B.  Parvin  and  G.  Medioni.  A  constraint  satisfaction  network  for  matching  3d  objects. 
In  Proceedings  of  the  International  Conference  on  Neural  Networks,  volume  II,  pages 
281-286,  Washington,  D.C,  June  1989. 

[41]  B.  Parvin  and  G.  Medioni.  A  dynamic  system  for  object  description  and  correspondence. 
Submitted  to  International  Conference  on  Computer  Vision,  1990. 

[42]  J.  Ponce.  Ribbons,  symmetries,  and  skew  symmetries.  In  Proceedings  of  the  DARPA 
Image  Understanding  Workshop ,  pages  1074-1079,  Cambridge,  Massachusetts,  1988. 

[43]  J.  Ponce  and  D.  J.  Kriegman.  On  recognizing  and  positionning  curved  3-D  objects  from 
image  contours.  In  Proceedings  of  the  DARPA  Image  Understanding  Workshop ,  pages 
461-470,  Palo  Alto,  CA,  1989. 

[44]  K.  Rao.  Shape  Description  from  Sparse  and  Imperfect  Data.  PhD  thesis,  University  of 
Southern  California,  December  1988.  IRIS  Technical  Report  250. 

[45]  K.  Rao,  R.  Nevatia,  and  G.  Medioni.  Issues  in  shape  description  and  an  approach  for 
working  with  sparse  data.  In  Workshop  on  Spatial  Reasoning  and  Multi-Sensor  Fusion, 
pages  168-177,  Chicago,  October  1987. 


29 


[46]  C.  Reinhart  and  R.  Nevatia.  Efficient  parallel  processing  in  high  level  vision.  In  Proceed¬ 
ings  of  the  DARPA  Image  Understanding  Workshop ,  Pittsburgh,  Pennsylvania,  Septem¬ 
ber  1990. 

[47]  A.  Rosenfeld.  Axial  representations  of  shape.  Computer  Vision,  Graphics,  and  Image 
Processing,  2(33):156-173,  1986. 

[48]  A.  Rosenfeld,  R.  A.  Hummel,  and  S.  W.  Zucker.  Scene  labeling  by  relaxation  operations. 
IEEE  Transactions  on  Systems,  Man  and  Cybernetics,  SMC-6(6):420-453,  June  1976. 

[49]  P.  Saint-Marc,  J.-S.  Chen,  and  G.  Medioni.  Adaptive  smoothing:  A  general  tool  for  early 
vision.  In  Proceedings  of  the  Conference  on  Computer  Vision  and  Pattern  Recognition, 
San  Diego,  California,  June  1989. 

[50]  P.  Saint-Marc  and  G.  Medioni.  Adaptive  smoothing  for  feature  extraction.  In  Proceed¬ 
ings  of  the  DARPA  Image  Understanding  Workshop,  pages  1100-1113,  Boston,  Mas¬ 
sachusetts,  April  1988.  Morgan  Kaufmann  Publishers,  Inc. 

[51]  P.  Saint-Marc  and  G.  Medioni.  B-spline  contour  representation  and  symmetry  detection. 
In  First  European  Conference  on  Computer  Vision,  pages  604-606,  Antibes,  France, 
April  1990. 

[52]  K.  Sato  and  S.  Inokuchi.  Range-imaging  system  utilizing  nematic  liquid  crystal  mask.  In 
Proceedings  of  the  IEEE  International  Conference  on  Computer  Vision,  pages  657-661, 
June  1987. 

[53]  F.  Stein  and  G.  Medioni.  Efficient  two-dimensional  object  recognition.  In  Proceedings  of 
the  International  Conference  on  Pattern  Recognition,  volume  1,  pages  13-17,  Atlantic 
City,  New  Jersey,  June  1990. 

[54]  F.  Stein  and  G.  Medioni.  Toss  -  a  system  for  efficient  three  dimensional  object  recog¬ 
nition.  In  Proceedings  of  the  DARPA  Image  Understanding  Workshop,  Pittsburgh, 
Pennsylvania,  September  1990. 

[55]  K.  Stevens  and  A.  Brookes.  Detecting  structures  by  symbolic  constructions  on  tokens. 
Computer  Vision,  Graphics,  and  Image  Processing,  37(238-260),  1987. 

[56]  F.  Ulupmar  and  R.  Nevatia.  Using  symmetries  for  analysis  of  shape  from  contour.  In 
Proceedings  of  the  IEEE  International  Conference  on  Computer  Vision,  pages  414-427, 
Tampa,  Florida,  December  1988. 

[57]  F.  Ulupmar  and  R.  Nevatia.  Inferring  shape  from  contour  for  curved  surfaces.  In 
Proceedings  of  the  International  Conference  on  Pattern  Recognition,  volume  1,  pages 
147-154,  Atlantic  City,  New  Jersey,  June  1990. 

[58]  F.  Ulupmar  and  R.  Nevatia.  Recovering  shape  from  contour  for  shges  and  eges.  In 
Proceedings  of  the  DARPA  Image  Understanding  Workshop,  Pittsburgh,  Pennsylvania, 
September  1990. 


30 


[59]  B.C.  Vemuri  and  J.K.  Agg&rwal.  3-D  model  construction  from  multiple  views  using 
range  and  intensity  data.  In  Proceedings  of  the  Conference  on  Computer  Vision  and 
Pattern  Recognition ,  pages  435-437,  1986. 

[60]  A.  Witkin  and  J.  Tenembaum.  What  is  perceptual  organization  for?  In  Proceedings  of 
the  International  Joint  Conference  on  Artificial  Intelligence ,  Karlsruhe,  W.  Germany, 
August  1983. 

[61]  P.  Wright  and  N.  Ashford.  Transportation  Engineering:  Planning  and  Design.  John 
Wiley  and  Sons,  1989. 


31 


3  Detailed  Technical  Papers 

This  section  contains  detailed  papers  published  in  previous  workshop  or  conference  proceed¬ 
ings  and  contains  the  technical  details  of  our  work. 

•  Efficient  Two-Dimensional  Object  Recognition,  F.  Stein  and  G.  Medioni.  Published  in 
the  Proceedings  of  the  10th  International  Conference  on  Pattern  Recognition,  Atlantic 
City,  New  Jersey,  June  16-21,  1990,  pp.  13-17. 

•  TOSS  -  A  System  for  Efficient  Three  Dimensional  Object  Recognition,  F.  Stein  and  G. 
Medioni.  Published  in  the  Proceedings  of  the  DARPA  Image  Understanding  Workshop, 
Pittsburgh,  Pennsylvania,  September,  1990,  pp.  537-543. 

•  Inferring  Shape  from  Contour  for  Curved  Surfaces,  F.  Ulupinar  and  R.  Nevatia.  Pub¬ 
lished  in  the  Proceedings  of  the  10th  International  Conference  on  Pattern  Recognition, 
Atlantic  City,  New  Jersey,  June  16-21,  1990,  pp.  147-154. 

•  Recovering  Shape  from  Contour  for  SHGCs  and  CGCs,  F.  Ulupinar  and  R.  Nevatia. 
Published  in  the  Proceedings  of  the  DARPA  Image  Understanding  Workshop,  Pitts¬ 
burgh,  Pennsylvania,  September,  1990,  pp.  544-556. 

•  B-Spline  Contour  Representation  and  Symmetry  Detection,  P.  Saint-Marc  and  G. 
Medioni.  Published  in  the  Proceedings  of  the  First  European  Conference  on  Com¬ 
puter  Vision,  Antibes,  France,  April  1990,  pp.  604-606. 

•  Efficient  Parallel  Processing  in  High  Level  Vision,  C.  Reinhart  and  R.  Nevatia.  Pub¬ 
lished  in  the  Proceedings  of  the  DARPA  Image  Understanding  Workshop,  Pittsburgh, 
Pennsylvania,  September,  1990,  pp.  829-839. 


32 


Efficient  Two  Dimensional  Object  Recognition1 

Fridtjof  Stein  and  Gerard  Medioni 

Institute  for  Robotics  and  Intelligent  Systems 
Powell  Hall  204 

University  of  Southern  California 
Los  Angeles,  California  90089-0273 

Email:  steinffiiris.usc.edu 


Abstract 

We  address  the  problem  of  recognition  of  multiple  flat 
objects  in  a  cluttered  environment  from  an  arbitrary  view¬ 
point  (weak  perspective).  The  models  are  acquired  au¬ 
tomatically  and  initially  approximated  by  polygons  with 
multiple  line  tolerances  for  robustness.  Groups  of  con¬ 
secutive  segments  (super  segments)  are  then  Gray  coded 
and  entered  into  a  hash  table.  This  provides  the  essen¬ 
tial  mechanism  for  indexing  and  fast  retrieval.  Once  the 
data  base  of  all  models  is  built,  the  recognition  proceeds 
by  segmenting  the  scene  into  a  polygonal  approximation; 
the  Gray  code  for  each  super  segment  retrieves  model  hy¬ 
potheses  from  the  hash  table.  Hypotheses  are  clustered  if 
they  are  mutually  consistent,  and  represent  the  instance 
of  a  model.  Finally,  the  estimate  of  the  transformation  is 
refined.  This  methodology  allows  us  to  recognize  models 
in  the  presence  of  noise,  occlusion,  scale,  ro'ation,  trans¬ 
lation  and  weak  perspective.  Unlike  ...o^t  of  the  current 
systems,  its  complexity  grows  as  0(kN )  when  N  is  the 
number  of  models,  and  Jt  <C  1. 


1  Introduction 

Object  recognition  involves  identifying  a  correspondence  between 
part  of  an  image  and  a  particular  view  of  a  known  object.  This 
requires  matching  the  image  against  stored  object  models  to  de¬ 
termine  if  any  of  the  models  could  produce  a  portion  of  the  image. 
In  the  last  twenty  years  several  systems  have  been  developed  to 
deal  with  the  problem  of  model-based  object  recognition  in  a 
scene.  Most  ofjthese  systems  try  to  solve  the  matching  task 
through  tree  search,  searching  through  all  promising  matches. 
There  had  been  numerous  attempts  to  deal  with  the  complexity 
issue  which,  in  most  of  these  attempts,  makes  the  recognition 
slow  and  ineffective.  Often  the  focus  of  the  research  is  directed 
towards  the  reduced  task  of  recognizing  only  one  or  two  objects 
in  a  scene.  But  even  then,  the  computational  complexity  is  ex¬ 
ponential  for  nontrivial  scenes. 

Grimson  and  Lozano  P4rez  [7,  8]  describe  a  system  which  is 
able  to  recognize  objects  from  sparse  scene  data.  If  there  are  m 
known  objects  with  n j  segments  each  and  s  scene  segments,  there 
are  ]CjLi(n;)*  combinations  of  pairings  between  scene  and  model 
segments.  The  system  tests  these  combinations  using  a  con¬ 
strained  tree  search.  The  number  of  combinations  that  need  to  be 
tested  grows  rapidly  with  object  complexity.  To  meet  the  com¬ 
plexity  challenge,  Ullman  and  Huttenlocher  [10,  11]  suggested  in 
their  alignment  approach  the  use  of  a  minimal  amount  of  infor¬ 
mation  with  highly  descriptive  features.  They  were  the  first  to 
offer  a  solution  with  polynomial  time  complexity.  The  model  is 
first  aligned  with  an  image  using  a  small  number  of  pairs  of  model 
and  image  features,  and  then  the  aligned  model  is  compared  di¬ 
rectly  against  the  image  by  mapping  the  object  model  into  image 
coordinates.  Another  way  to  avoid  explosion  of  complexity  due 
to  extensive  search  is  a  good  indexing  mechanism.  Indexing  can 
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reduce  the  search  space  effectively.  A  smart  approach  based  on 
some  heuristics  and  under  the  assumption  of  fixed  scale  was  de¬ 
scribed  by  Knoll  and  Jain  [13]  in  their  paper,  which  is  based  on 
what  they  call  feature  indexed  hypotheses.  They  take  advantage 
of  the  similarities  and  differences  between  model  types  to  group 
candidate  models.  For  each  feature,  a  list  is  kept  of  where  it 
occurs  in  each  object  type.  When  a  match  is  found  for  a  feature 
in  an  image,  models  are  hypothesized  for  each  object  identity 
and  orientation  in  the  feature’s  list.  Each  of  these  hypotheses  is 
then  tested  using  a  template  match  to  determine  which,  if  any, 
are  correct.  A  system  which  is  able  to  recognize  models  from 
a  data  base  with  up  to  100  models  was  introduced  by  Kalvin, 
Schonberg,  Schwartz  and  Sharir  [12].  They  assume  fixed  scale 
and  concentrate  their  representation  effort  on  the  segmentation 
of  the  boundary  in  boundary  parts,  which  are  likely  to  belong  to 
one  model,  and  which  they  call  footprints.  These  footprints  arc 
used  to  match  a  scene  against  a  data  base  with  a  hashing  scheme. 
They  call  this  indexing  mechanism  geometric  hashing.  Another 
method  based  on  indexing  was  suggested  by  Lamb  dan,  Schwartz 
and  Wolfson  [14,  15].  They  have  developed  an  algorithm  which 
deals  successfully  with  the  combinatorial  explosion  of  possible 
interpretations  of  a  model  by  using  the  scene  coordinates  as  an 
index  for  a  voting  scheme.  This  method  works  fast  for  one  model 
and  not  too  many  additional  scene  points  which  do  not  belong 
to  the  model  in  the  scene.  But  if  we  have  to  deal  with  multiple 
models  and  cluttered  scenes  with  a  lot  of  extra  points,  or  when 
a  model  is  not  in  the  scene  the  system  suffers  from  ineffective 
search. 

By  reviewing  the  object  recognition  systems  of  the  past  we 
encounter  the  following  problems,  formulated  as  questions: 
Generality  Will  the  system  work  for  any  object,  or  do  we  have 
to  use  different  methods  for  smooth  or  convex  objects? 
Stability  Can  the  system  recognize  an  object  segmented  differ¬ 
ently  (because  of  scale,  noise,  quantization,  . . . )? 
Robustness  Will  it  work  on  real  data,  with  noise  and  quanti¬ 
zation? 

Viewpoint  Can  the  system  handle  a  wide  range  of  viewpoints 
and  therefore  take  into  account  translation,  rotation,  scale 
and  perspective? 

Multiple  Instances  Can  we  deal  with  multiple  instances  of  a 
model  in  a  scene? 

Good  Worst  Case  Performance  Can  the  system  still  be  fast 
when  no  model  is  in  the  scene? 

Performance  What  are  the  effects  of  model  size,  scene  size  and 
number  of  models  on  the  performance? 

We  address  these  issues  in  our  system. 

The  paper  is  organized  as  follows.  In  Section  2  we  talk  about 
super  segments  and  their  representation.  Section  3  focuses  on  the 
representation  of  models  and  scenes,  their  matching,  the  verifi¬ 
cation  process  and  a  discussion  of  the  complexity  of  our  system. 
In  Section  4  we  present  some  examples. 
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2  Representation 
2.1  Basic  Idea 

Our  representation  of  a  model  or  a  scene  is  based  on  a  polygonal 
approximation.  We  are  not  dependent  of  any  feature  detection 
algorithm  and  we  do  not  handle  explicit  distinguished  points  like 
corners  or  inflection  points.  Our  opinion  is  that  curvature  is  the 
most  important  feature  of  a  general  curve.  It  is  invariant  with 
regard  to  scale,  rotation  and  translation.  When  we  include  some 
redundancy,  we  can  also  add  the  weak  perspective  to  that  list. 
By  using  a  polygonal  approximation  we  lose  most  of  the  curva¬ 
ture  information,  but  we  keep  parts  of  this  information  in  the 
angles  of  consecutive  line  segments  (see  also  Lowe’s  SCERFO 
system  [16]).  Obviously,  there  is  not  a  unique  polygonal  approxi¬ 
mation  for  a  curve  (see  for  example  Figure  1).  Therefore,  for  the 


(a)  (b)  (c) 


Figure  1:  Elephant  Shape  (a)  and  Different  Polygonal  Approxi¬ 
mations  (b)  and  (c) 


purpose  of  robustness,  we  use  several  polygonal  approximations 
with  different  line  fitting  tolerances.  Since  we  want  to  handle  oc¬ 
clusion,  we  do  not  expect  to  obtain  complete  boundaries  in  our 
scenes,  but  only  portions  of  them.  On  the  other  hand,  individ¬ 
ual  segments  are  too  local  to  be  useful  as  matching  primitives. 
Grouping  a  fixed  number  of  adjacent  segments  provides  us  with 
our  basic  features,  the  super  segments.  In  accordance  to  Figure  2, 


Figure  2:  Super  Segment 


super  segments  are  characterized  by  their  cardinality  (number  of 
segments),  angles  (between  consecutive  segments),  and  the  ar- 
dength  (sum  of  the  segment  lengths).  In  addition  we  define  the 
location  as  the  middle  vertex  of  a  super  segment  (we  only  use 
super  segments  of  even  cardinality),  the  orientation  (the  vector 
between  the  predecessor  and  the  successor  of  the  middle  vertex!, 
the  normal  vector  (the  vector  normal  to  the  orientation  vector), 
and  the  second  moment  ratio.  The  second  moment  is  defined  as 
the  ellipse  that  can  be  computed  from  the  covariance  matrix  of 
the  vertices.  The  two  eigenvectors  of  the  covariance  matrix  de¬ 
fine  the  axis  of  the  ellipse.  We  use  the  eccentricity  (ratio  of  the 
lengths  of  the  two  axis)  as  a  feature  (the  second  moment  ratio) 
for  super  segments. 

As  mentioned  before,  we  are  mainly  interested  in  the  curva¬ 
ture  information  implicitly  captured  by  the  super  segment  an¬ 
gles.  That  is  the  reason  why  we  use  them  to  encode  a  super 
segment.  To  avoid  establishing  matches  between  super  segments 
which  have  the  same  angles  but  totally  different  shapes,  we  add 
the  measure  of  eccentricity  (second  moment  ratio)  to  our  coding 
scheme. 


2.2  Gray  Coding 

One  of  our  goals  is  to  devise  a  robust  representation  to  help  re¬ 
duce  the  problems  associated  with  low  level  segmentation.  Con¬ 
nell  and  Brady  [3]  used  the  Gray  code  [9]  approach  to  get  a 
difference  metric  for  a  learning  system  designed  to  learn  object 
shapes.  They  developed  a  technique  to  compare  different  data 
types,  and  this  is  used  in  our  approach.  Gray  coding  is  a  gen¬ 
eralized  quantization  scheme.  It  is  not  the  only  scheme  for  our 
method  to  work;  other  quantizations  will  have  the  same  perfor¬ 
mance.  However  Gray  coding  provides  a  clean  way  of  quanti¬ 
zation  and  the  computation  of  a  difference  without  hiding  the 
mechanisms  in  the  system.  The  use  of  the  Gray  code  (see  Fig¬ 
ure  3  (a))  is  important  in  digital  communication.  The  loss  of 
one  bit  information  in  a  Gray  coded  number  changes  the  value 
it  represents  by  only  one.  In  this  way,  the  semantic  difference 
(difference  between  the  values)  corresponds  to  the  syntactic  dif¬ 
ference  (the  Hamming  distance).  This  property  makes  it  useful 
for  all  applications  where  redundancy  is  desirable.  In  digital  com¬ 
munication,  this  property  means  that  the  effect  of  losing  any  one 
bit  is  uniformly  noncatastrophic.  In  computer  vision,  this  prop¬ 
erty  can  help  compare  slightly  different  representations.  First  we 
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Figure  3:  Gray  Code  (a),  Gray  Code  for  a  Set  (b),  Gray  Code 
for  Angles  (c) 

have  to  generalize  the  Gray  code.  To  encode  intervals,  which  may 
either  be  continuous  or  discrete,  we  can  use  overlapping  ranges. 
As  illustrated  in  Figure  3  (b)  we  create  a  set  of  overlapping  bi¬ 
nary  predicates  A  through  K  whose  range  cover  the  interval.  This 
means  that  a  particular  value  is  encoded  by  the  set  of  intervals 
it  is  in:  2  is  encoded  as  Gray(2)  =  (A,  C,  F,  J },  4  is  encoded 
as  Gray(i)  =  (A,  D,  G,  J},  and  7  is  encoded  as  Gray( 7)  = 
(B,  D,  H,  K}.  We  define  a  difference  metric  as  A  (s',  j)  =  number 
of  different  predicates  between  the  Gray  codes  of  i  and  j.  There¬ 
fore  we  get  a  maximum  distance  AmaI  =  number  of  predicate 
layers  (in  our  example  Amo,  =  4).  The  difference  between  2  and 
4  is  A(2,4)  =  2  and  the  difference  between  2  and  7  is  A(2,  7)  =  4. 
The  chosen  Gray  coding  in  our  example  is  not  the  only  possible 
one.  By  changing  the  range  of  the  predicates  or  by  adding  new 
predicates,  we  can  refine  the  resolution  of  our  representation. 

Figure  3  (c)  shows  the  Gray  coding  of  super  segments  of  cardi¬ 
nality  two.  They  consist  of  two  segments  and  one  angle  between 
them.  The  angle  can  lie  in  a  range  [-180°, +180°]  and  is  used 
to  Gray  code  the  super  segments.  For  example  the  Gray  code  of 
super  segment  1  is  encoded  as  (A,  C,  F,  J},  and  the  Gray  code  of 
super  segment  2  is  encoded  as  {A,  C,  G,  J}.  The  difference  of  1 
represents  our  intuitive  feeling  that  the  two  super  segments  are 
rather  similar. 

Gray  coding  can  be  easily  extended  to  higher  dimensions.  The 
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super  segment  in  Figure  2  can  be  encoded  by  taking  all  the  an¬ 
gles,  encoding  each  one  separately  (with  a  different  Gray  code 
table  for  disjunct  predicates)  and  keeping  all  the  predicates  in 
one  set.  In  this  example,  the  Gray  code  of  the  super  segment 
is  the  set  {Gray(ai),Gray{ai),Gray(aj)}  =  {(Pm-Piii  —  -Pin), 
[Pji.Fu, -Fim],  \Pii,Pii,  -Pin]}  where  P,}  are  the  predicates 
and  n  =  AmaI. 


3  Recognition 


3.1  Object  Representation 


As  mentioned  in  the  previous  sections,  we  want  to  represent  our 
model  (or  scene)  with  super  segments.  Therefore  we  first  apply 
an  edge  detection  algorithm  on  the  image  with  the  model  (or 
scene).  We  use  the  Canny  edge  detector  [4]  for  grey  level  images 
and  a  simple  boundary  tracer  for  binary  images.  The  resulting 
edgels  are  further  processed  with  a  line  fitting  algorithm  to  com¬ 
pute  the  polygonal  approximations.  Connected  linear  segments 
form  chains  of  adjacent  segments.  The  segment  chains  provide 
the  super  segments  by  grouping  a  fixed  number  of  adjacent  seg¬ 
ments.  We  then  take  all  the  super  segments,  encode  them  and 
take  the  resulting  predicates  as  a  key  for  a  hash  table,  where  we 
record  the  super  segment  as  an  entry  (see  Figure  4).  That  means, 
that  every  super  segment  is  stored  under  the  predicates,  which 
represent  the  intervals,  in  which  the  angles  (or  other  attributes) 
lie. 


Figure  4:  Representation  of  a  Model 


3.2  Matching 

By  using  indexing  for  the  matching  process,  we  only  select  a  small 
set  of  candidate  models  that  are  likely  to  be  present  in  the  im¬ 
age.  We  assume  that  most  objects  in  our  data  base  (hash  table) 
are  redundantly  specified  by  their  super  segments.  The  scene  is 
preprocessed  as  explained  in  Section  3.1  to  generate  all  the  su¬ 
per  segments.  These  are  Gray  coded  and  the  predicates  are  used 
to  retrieve  the  matching  hypotheses  between  the  super  segments 
of  model  and  scene  (see  Figure  5).  Two  super  segments  sj  and 
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Figure  5:  Matching  of  Model(s)  with  Scene 


potheses.  Next,  we  divide  these  n  hypotheses  H  =  {hi,  hj,  ...h„) 
according  to  which  model  the  model  super  segment  of  the  hy¬ 
pothesis  belongs  to.  We  store  these  into  a  correspondence  ta¬ 
ble  where  we  have  the  models  m<  as  keys  and  the  i *  hypotheses 
Hi  =  {h\,  hj,  —h\t }  (with  Hi  C  H)  as  entries  (see  Figure  6).  The 


|  hypotheses  [ 

| 

correspondence  f 
table  | 

ssk-mssi  <-h,) 

SS/sSj  (=h2) 

- 

model  — »  h,,h£  ... 

m  1  a 

• 

• 

modeln  — *  h^.  h9  ... 

• 

''I 

Constraints 

distance 

angle 

direction 


Figure  6:  Verification  of  Hypotheses 


next  step  is  the  formation  of  consistent  clusters.  For  every  model 
m,  we  have  to  check  which  hypotheses  h'x  and  hj  with  hj  ^  hj 
are  consistent  with  each  other.  We  do  not  check  every  hypothe¬ 
sis  against  every  other,  instead,  we  adopt  the  criterion  that  three 
consistent  hypotheses  are  sufficient  to  instantiate  the  model  in 
the  scene.  If  we  have  three  consistent  hypotheses  C  =  {  hj ,  h j ,  hj } 
with  C  Q  Hi  for  one  model  m,,  we  examine  the  remaining  hy¬ 
potheses  in  Hi\C  and  collect  those,  that  are  consistent  with 
at  least  one  of  the  selected  three  in  C.  When  we  have  found 
one  instance,  represented  by  /  =  C  U  F,  with  F  the  additional 
found  consistent  hypotheses,  we  try  to  find  more  instances  in  the 
remaining  hypotheses  Hi  \  I. 

But  what  is  meant  by  consistency?  We  use  the  powerful  con¬ 
straints  (distance,  angle,  and  direction)  introduced  by  Grimson 
and  Lozano-P6rez  (7,  8]  to  prune  efficiently  the  interpretation 
trees,  and  build  our  clusters.  In  the  two  dimensional  domain, 
these  three  constraints  define  the  attitude  of  one  feature  rela¬ 
tively  to  another  since  it  specifies  the  three  degrees  of  freedom 
(two  translational  and  one  rotational). 

After  we  group  the  hypotheses  into  clusters  which  represent  in¬ 
stances  of  models,  we  can  compute  the  transformation  from  the 
model  coordinates  to  the  scene  coordinates  by  applying  a  least 
squares  calculation  on  all  the  matching  super  segments.  Because 
of  noise,  we  get  in  general  a  good  first  guess  for  the  transforma¬ 
tion  but  not  an  exact  match.  A  second  least  squares  match  on 
corresponding  comers  or  segments  can  refine  the  result.  This  is 
similar  to  the  refinement  procedures  used  by  Lambdan  [14,  15], 
Huttenlocher  [10,  11],  and  Mundy  [17]. 

3.4  Complexity  Analysis 

By  addressing  the  complexity  issue,  we  can  study  the  behavior  of 
the  system  when  we  have  to  deal  with  large  data  bases,  meaning 
more  than  one  or  two  models.  In  order  to  simplify  the  calcula¬ 
tions,  we  make  the  assumptions  that  every  model  has  the  same 
number  of  super  segments  and  that  the  entries  are  equally  dis¬ 
tributed  over  the  hash  table.  When  we  view  q  (the  number  of 
super  segments  in  the  scene)  as  constant,  we  can  proof  that  the 
overall  cost  is 


s2  match  if  AJs^jj)  =  0.  Retrieving  matches  with  larger  Gray 
distance  is  possible  by  substitution  of  predicates  with  neighbor 
predicates. 

3.3  Verification 

We  compute  all  possible  matches  for  the  super  segments  of  the 
scene  with  the  model  super  segments  to  generate  multiple  hy¬ 


Ortcognitt  =  Omatch  +  Ovtrify  =  0(q)  +  O(M)  =  O(Af), 

where  Af  is  the  number  of  models  in  the  data  base.  It  is  inter¬ 
esting  to  note  that  the  verification  is  the  step  which  contributes 
to  the  linear  cost,  whereas  the  retrieval  time  is  constant.  The 
question  remains:  what  is  the  difference  between  our  approach 
and  a  recognition  system  that  tries  to  recognize  one  model  after 
the  other  and  therefore  also  has  the  cost  of  0( Af)?  The  main 
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difference  Lie*  in  the  (lope  of  the  linear  cost  function,  dependent 
on  the  occurrence  of  a  model  in  the  scene  or  its  absence.  In  our 
results  (Section  4.4)  the  cost  for  detecting  the  absence  of  a  model 
is  less  than  10%  of  the  cost  for  the  detection  of  the  occurrence. 

4  Examples  and  Performance 

4.1  Parameters  for  Gray  Coding 

To  encode  the  super  segments,  we  need  to  choose  the  parameters 
for  the  Gray  coding.  What  are  the  guidelines  for  setting  the 
values  for  the  cardinality  of  the  super  segments  and  the  interval 
size  of  the  Gray  coding? 

The  cardinality  issue  reduces  to  the  question  of  how  local  or 
global  our  representation  should  be.  The  higher  the  cardinality, 
the  more  global  and  descriptive  is  the  representation,  but  we  have 
to  rely  on  the  existence  of  long  scene  super  segments  belonging 
to  one  model.  The  lower  the  cardinality,  the  more  local  the  de¬ 
scription  and  the  more  false  matches  are  retrieved.  The  answer 
lies  in  the  scene: 

complex  shapes  We  should  use  super  segments  of  higher  car¬ 
dinality  (like  the  animal  shapes  in  Figure  7). 
simple  shapes  The  use  of  super  segments  with  lower  cardinal¬ 
ity  is  promising  for  good  recognition  performance  (like  the 
aircraft  shapes  in  Figure  8). 

The  best  answer  to  the  question  is  the  use  of  a  multi  cardinality 
representation,  where  we  prefer  matches  of  super  segments  with 
higher  cardinality  to  super  segments  with  lower  cardinality.  In 
our  implementation,  however,  we  set  the  cardinality  to  a  fixed 
value. 

The  interval  size  of  the  Gray  coding  is  the  measurement  of  how 
redundant  the  representation  is.  In  the  scenes  with  the  animal 
shapes  (Figure  7)  we  used  a  relatively  small  interval  size  of  30°. 
For  the  grey  level  image  with  the  aircraft  scene  (Figure  8)  in 
which  we  have  to  handle  noisy  edges,  we  use  a  larger  interval  size 
of  60°. 

4.2  Animal  Shapes 

We  have  tested  our  system  with  a  library  of  twelve  animal  shapes 
(Figure  7  (a)).  We  obtained  them  from  coarsely  digitized  binary 
images.  We  look  for  objects  with  a  large  variety  of  features.  We 
want  simple  objects,  complex  objects,  and  objects  with  a  lot  of 
similarities.  We  set  the  super  segment  cardinality  to  6.  As  the 
key  for  our  matching  we  take  the  Gray  code  of  the  angles  of  the 
super  segment  and  the  Gray  code  of  the  ratio  of  the  second  mo¬ 
ment  of  the  vertices  of  the  super  segment.  Each  angle  is  encoded 
with  an  interval  <jf  30°,  which  corresponds  to  12  intervals.  For 
the  ratio  of  the  second  moment  we  take  5  intervals.  Therefore  the 
maximum  matching  hash  table  size  is  5  »  126  ~  15  •  10®  entries. 
By  putting  the  shapes  of  the  twelve  animals  into  the  hash  table 
it  has  a  size  of  1198  entries  (sc  0.01  %  of  maximum).  The  animal 
scene  was  taken  by  printing  the  three  animals,  enlarging  them, 
and  cutting  them  out.  Finally  we  put  the  three  silhouettes  on 
a  light  table  and  took  a  picture  with  a  video  camera.  The  cam¬ 
era  had  an  angle  of  about  20°  to  the  normal  of  the  light  table. 
This  procedure  guaranteed  scaling,  occlusion,  rotation,  transla¬ 
tion,  weak  perspective,  and  noise  (thanks  to  our  high  skills  in 
cutting  out  -  look  at  the  ears  of  the  giraffe).  The  recognition 
time  (not  including  representation  generation  time)  for  the  scene 
(see  Figure  7  (b)  and  (c))  is  4.1  seconds  on  a  Symbolics  3675  lisp 
machine. 

4.3  Aircraft  Shapes 

In  another  example  (see  Figure  8)  we  try  to  recognize  airplanes 
from  an  aerial  photograph  scene  of  an  airport.  The  data  base 
consists  only  of  one  airplane  model  (see  Figure  8  (a)).  We  cre¬ 
ated  the  polygonal  approximations  manually.  We  set  the  super 
segment  cardinality  to  4.  We  take  as  key  for  our  matching  the 
Gray  code  of  the  three  angles  of  the  super  segments  and  their 
second  moment  ratio.  Each  angle  is  encoded  with  an  interval  of 
60°,  which  corresponds  to  6  intervals.  For  the  ratio  of  the  second 
moment  we  take  5  intervals.  Therefore  the  maximum  matching 
hash  table  size  is  5  •  63  =  1080  entries.  After  putting  the  shape 


of  the  airplane  into  the  hash  table  it  has  the  size  of  98  entries 
(=  9  %  of  maximum).  The  recognition  time  (not  including  rep¬ 
resentation  generation  time)  for  the  four  planes  in  the  scene  is 
8.33  seconds.  By  looking- at  Table  1  it  is  obvious  that  the  time  to 
verify  is  relatively  longer  than  in  the  animal  example.  The  reason 
is  that  we  have  four  instances  whose  hypotheses  all  had  the  same 
model  index.  To  avoid  complexity  problems  we  include  an  inter¬ 
mediate  indexing  step.  Based  on  the  fact  that  super  segments 
which  belong  to  one  airplane  in  the  scene  have  similar  transfor¬ 
mation  parameters,  we  use  the  translational  part  to  preindex  the 
hypotheses. 

4.4  Large  Data  Base 

In  this  example  we  direct  our  attention  to  a  large  data  base.  We 
create  200  models  by  randomly  overlapping  more  than  3  and  less 
than  7  random  triangles.  Our  models  consist,  in  the  average,  of 
48  super  segments.  We  set  the  super  segment  cardinality  to  6. 
As  the  key  we  take  the  Gray  code  of  all  angles  and  the  second 
moment  ratio.  Each  angle  is  encoded  with  an  interval  of  60° 
which  corresponds  to  6  intervals.  For  the  ratio  of  the  second 
moment  we  take  5  intervals.  Therefore  the  maximum  matching 
hash  table  size  is  5  *  6s  =  38880  entries.  Our  scene  is  generated 
by  taking  three  models  out  of  the  200.  We  rotate  and  ove.lap 
them  artificially.  Then  we  let  the  system  recognize  the  scene  with 
different  data  base  sizes.  The  results  are  shown  in  the  graphs  in 
Figure  9.  There  are  several  interesting  observations: 

•  The  hash  table  does  not  grow  linearly.  There  is  a  certain 
saturation  effect.  The  cause  is  the  nonequal  distribution  of 
the  (encoded)  super  segments  in  the  hash  table. 

•  The  three  models  in  the  scene  were  the  first  which  we  put 
into  the  data  base.  Therefore  the  recognition  time  increases 
very  fast  for  the  first  three  models.  The  slope  is  approxi¬ 
mately  1.3  seconds  per  model. 

•  Adding  more  models  to  the  database  has  little  effect  on  the 
performance.  The  recognition  time  curve  has  in  its  linear 
range  the  slope  of  approximately  0.1  seconds  per  model. 
The  saturation  effect  results  in  an  increasing  steepness.  The 
cause  is  the  increase  of  the  average  number  of  hypotheses 
per  hash  table  entry. 

5  Conclusion  and  Future  Research 

The  results  in  the  previous  section  illustrate  the  fact  that  index¬ 
ing  is  a  powerful  tool.  Combined  with  the  redundant  representa¬ 
tion  of  the  Gray  code  it  can  handle  the  uncertainty  of  segmenta¬ 
tion,  and  is  also  fast.  Our  basic  feature,  super  segments,  provide 
us  with  enough  information  to  decide  whether  a  model  might  be 
in  an  image  scene  or  not.  We  believe  that  this  approach  could 
have  applications  in  other  fields  of  Computer  Vision. 

At  the  moment  we  extend  the  Gray  coding  idea  to  three  dimen¬ 
sional  object  recognition  from  three  dimensional  data.  We  have 
very  promising  results.  Our  goal  is  to  be  able  to  recognize  com¬ 
plex  three  dimensional  objects  from  two  dimensional  grey  value 
images.  The  application  of  the  developed  method  to  real  im¬ 
agery  must  address  the  difficulties  of  occlusion,  diffuse  shadows 
and  different  lighting  conditions. 

We  tire  also  currently  investigating  the  possibility  of  parallel 
implementation.  The  data  structure  of  a  hash  table  is  suitable 
for  parallel  processing.  Therefore  we  could  perform  the  matching 
step  in  parallel.  Combined  with  a  constraint  satisfaction  mod¬ 
ule  to  perform  the  verification  task,  we  dare  to  expect  object 
recognition  results  in  real  time. 
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Abstract 

We  present  an  approach  for  the  recognition  of  multiple 
three  dimensional  object  models  from  three  dimensional 
scene  data.  We  are  addressing  the  problem  in  a  realis¬ 
tic  environment:  the  viewpoint  is  arbitrary,  the  objects 
vary  widely  in  complexity,  and  we  make  no  assumptions 
about  the  structure  of  the  surface.  We  come  up  with  a 
data  structure  which  we  call  a  splash.  A  splash  consists 
of  circular  groupings  of  surface  normals.  Such  a  splash 
captures  structural  surface  properties  in  a  way  that  we 
can  represent  them  by  sets  of  two  dimensional  structures 
called  super  segments.  Encoded  super  segments  provide 
the  mechanism  for  fast  matching.  The  acquisition  of  the 
three  dimensional  models  is  performed  automatically  by 
computing  splashes  in  highly  structured  areas  of  the  ob¬ 
jects.  For  every  model  all  splashes  are  mapped  on  super 
segments.  The  encoded  super  segments  are  recorded  in 
a  data  base.  The  scene  is  screened  for  highly  structured 
areas.  In  these  areas  splashes  are  computed  and  mapped 
on  super  segments.  The  encoded  super  segments  retrieve 
hypotheses  from  the  data  base.  Clusters  of  mutually  con¬ 
sistent  hypotheses  represent  instances  of  models.  The  lo¬ 
cation  of  the  instance  in  the  scene  is  found  by  applying 
a  least  squares  match  on  all  corresponding  points.  We 
present  results  with  our  current  system  TOSS  {Three  di¬ 
mensional  Object  recognition  based  on  Super  Segments) 
and  discuss  further  extensions. 

1  Introduction 

In  this  paper  we  present  an  object  recognition  system 
which  is  able  to  match  general  three  dimensional  objects 
in  an  efficient  way.  By  using  the  words  “three  dimen¬ 
sional'’  we  talk  about  models  and  scenes  having  a  three 
dimensional  representation.  By  talking  about  “general 
objects”  we  do  not  make  any  assumptions  about  the 
shape  of  the  objects.  Matching  and  recognizing  in  an 
“efficient  way”  is  based  on  a  fast  indexing  and  retrieval 
system  that  has  a  complexity  which  grows  as  O(kN) 
when  N  is  the  number  of  models,  and  k  <  1. 

Representing  a  three  dimensional  object  is  either  pos¬ 
sible  by  using  a  surface  or  a  volumetric  description.  Vol¬ 
umetric  descriptions  from  a  single  view  require  a  diffi¬ 
cult  inference  step  to  compensate  for  the  unseen  part,  so 
we  will  use  descriptions  based  on  visible  surface  instead. 
The  task  of  object  recognition  involves  identifying  a  cor¬ 
respondence  between  a  part  of  one  range  image  and  a 
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part  of  another  range  image  with  a  particular  view  of 
a  known  object.  This  requires  the  ability  to  match  one 
surface  patch  of  one  range  image  against  a  surface  patch 
of  another  range  image.  The  question  is:  “How  can  we 
represent  a  surface  patch  so  that  it  can  me  matched  in 
an  efficient  way?” 

Reviewing  the  systems  of  the  past,  no  system  (known 
to  the  authors)  was  able  to  represent,  match,  and  rec¬ 
ognize  general  three  dimensioned  objects.  Most  object 
recognition  systems  to  date  either  rely  on  exact,  CAD- 
like  models,  or  make  restrictive  assumptions  on  the  pos¬ 
sible  shape  of  the  surface  patches. 

1.1  Previous  Work 

Grimson  and  Lozano  Perez  [8,  9]  describe  a  system  which 
is  able  to  recognize  objects  from  sparse  scene  data.  They 
exploit  geometric  constraints  to  prune  the  search  tree  of 
all  possible  matches  between  scene  data  and  model  data. 
Still,  the  number  of  combinations  that  need  to  be  tested 
grows  rapidly  with  object  complexity.  If  a  consistent 
transformation  is  found,  the  object  is  recognized. 

Bhanu  [l]  presents  a  3D  scene  analysis  system  for  the 
shape  matching  of  real  world  3D  objects.  Object  models 
are  constructed  using  multiple-view  range  images.  The 
object  is  represented  as  a  set  of  planar  faces  approxi¬ 
mated  by  polygons.  Shape  matching  is  performed  by 
matching  the  face  description  of  an  unknown  view  with 
the  stored  model  using  a  relaxation-based  scheme  called 
stochastic  face  labeling. 

Horaud  and  Bolles  [2]  present  the  3DPO  system  for 
recognizing  and  locating  3D  parts  in  range  data.  The 
model  consists  of  two  parts:  an  augmented  CAD  model 
and  a  feature  classification  network.  The  model  objects 
are  represented  by  a  tree-like  network  such  that  each 
feature  contains  a  pointer  to  each  instance  in  the  CAD 
models.  A  local-feature-focus  method  is  used  for  the 
matching  process. 

Faugeras  and  Hebert  [7]  developed  a  system  to  recog¬ 
nize  and  locate  rigid  objects  in  3D  space.  Model  objects 
are  represented  in  terms  of  linear  features  such  as  points, 
lines,  and  planes.  Range  images  are  used  as  input.  At 
first,  possible  pairings  between  model  and  scene  features 
are  established,  the  transformation  is  estimated  using 
quaternions.  Then,  further  matches  are  predicted  and 
verified  by  the  rigidity  constraints. 

Ikeuchi  [10]  developed  a  method  for  object  recognition 
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in  bin-picking  tasks.  The  models  consist  of  surface  in¬ 
ertia,  surface  relationship,  surface  shape,  edge  relation¬ 
ship,  extended  Gaussian  image,  and  surface  characterstic 
distribution.  Since  this  system  is  mainly  designed  for  the 
task  of  bin-picking,  only  one  type  of  object,  which  is  the 
same  one  as  in  the  model,  appears  in  the  scene. 

Fan  [6,  5]  presents  a  system  which  takes  range  images 
as  input  and  automatically  produces  a  symbolic  descrip¬ 
tion  of  the  objects  in  the  scene  in  terms  of  their  visi¬ 
ble  surface  patches.  This  segmented  representation  may 
be  viewed  as  a  graph  whose  nodes  capture  information 
about  the  individual  surface  patches  and  whose  links  rep¬ 
resent  the  relationships  between  them.  The  matching  of 
a  scene  with  a  model  is  based  on  the  comparison  of  the 
two  graphs. 

With  3D-POLY  Chen  and  Kak  [4]  developed  a  system 
in  which  they  present  a  novel  approach  of  organizing  the 
feature  data  for  three  dimensional  objects.  They  present 
a  data  structure  which  they  call  feature  sphere.  The 
matching  and  verification  step  is  based  on  comparing 
spatial  relationships  of  special  feature  sets. 

The  closest  work  to  our  approach  was  done  by  Radack 
and  Badler  [l  l].  They  introduce  a  new  surface  represen¬ 
tation  called  distance  profile.  These  profiles  are  used  for 
the  matching  process.  This  method  reduces  the  match¬ 
ing  of  three  dimensional  surfaces  to  the  matching  of  two 
dimensional  curves.  They  use  points  with  high  curvature 
to  position  the  centers  of  the  distance  profiles. 

Many  systems  were  developed  which  based  on  differ¬ 
ent  assumptions  about  shape,  such  as  polygonal  shapes, 
solids  of  revolution  or  generalized  cylinders.  In  contrast, 
we  believe  that  our  proposed  system  TOSS  (Three  di¬ 
mensional  Object  recognition  based  on  Super  Segments) 
is  able  to  recognize  rigid  objects,  whose  shapes  are  not 
constrained  by  any  simplifying  assumptions.  Our  algo¬ 
rithm  uses  a  representation,  which  is  designed  to  capture 
the  structure  (curvature)  of  a  surface  patch  and  allows 
fast  matching. 

1.2  Our  Previous  Work 

This  paper  describes  a  continuation  of  our  early  work  [12] 
which  addresses  the  problem  of  recognition  of  multiple 
flat  objects  in  a  cluttered  environment  from  an  arbitrary 
viewpoint  (weak  perspective).  The  models  are  acquired 
automatically  and  initially  approximated  by  polygons 
with  multiple  line  tolerances  for  robustness.  Groups  of 
consecutive  linear  segments  (super  segments)  are  then 
quantized  with  a  Gray  code  and  entered  into  a  hash  ta¬ 
ble.  This  provides  the  essential  mechanism  for  indexing 
and  fast  retrieval.  Once  the  data  base  of  all  models  is 
built,  the  recognition  proceeds  by  segmenting  the  scene 
into  a  polygonal  approximation;  the  Gray  code  for  each 
super  segment  retrieves  model  hypotheses  from  the  hash 
table.  Hypotheses  are  clustered  if  they  are  mutually 
consistent,  and  represent  the  instance  of  a  model.  Fi¬ 
nally,  the  estimate  of  the  transformation  is  refined.  This 
methodology  allows  us  to  recognize  models  in  the  pres¬ 
ence  of  noise,  occlusion,  scale,  rotation,  translation  and 
weak  perspective.  Unlike  most  of  the  current  systems, 
its  complexity  grows  as  O(kN)  when  N  is  the  number 
of  models,  and  k  4C  1. 


1.3  Plan 

The  paper  is  organized  as  follows:  Section  2  introduces 
our  basic  feature,  the  splash.  It  consists  of  a  group  of 
surface  normals  which  are  mapped  on  a  two  dimensional 
structure.  We  show  how  we  can  use  our  two  dimensional 
approach  to  represent  these  two  dimensional  structures 
in  a  way  that  allows  us  to  match  the  three  dimensional 
splashes.  Section  3  focuses  on  the  representation  of  a 
general  three  dimensional  object,  the  matching  and  the 
verification  process.  In  Section  4  we  show  results  of  our 
current  implementation. 

2  The  Splash 

Extending  the  two  dimensional  basic  feature  of  the  super 
segment  to  three  dimensions  to  obtain  a  feature  which 
represents  surfaces  is  awkward:  the  polygonal  approxi¬ 
mation  of  a  two  dimensional  boundary  has  a  property 
which  is  crucial  for  the  super  segment  idea,  but  which 
is  not  extendable  to  higher  dimensions:  the  well  defined 
order  of  the  neighborhood  of  a  linear  segment.  Every 
segment  on  a  two  dimensional  polygon  has  two  adja¬ 
cent  neighbor  segments.  Based  on  this  fact,  super  seg¬ 
ments  can  be  generated  by  grouping  adjacent  segments 
together.  In  three  dimensions  this  ordered  neighborhood 
property  does  not  exist.  Linear  or  other  segmentations 
of  a  surface  (or  volume)  lead  in  general  to  patches  which 
can  have  any  number  and  order  of  neighbor  patches. 
This  is  a  reason  why  we  decided  not  to  go  the  path  of  a 
linear  (or  higher  order)  surface  segmentation  to  obtain  a 
representation  for  matching  and  recognition.  What  are 
the  requirements  that  a  representation  for  general  three 
dimensional  objects  has  to  meet?  We  want  the  represen¬ 
tation  to  be 

1.  translation  invariant, 

2.  rotation  invariant, 

3.  general,  in  that  we  do  not  have  to  make  any  assump¬ 
tions  about  the  shape  of  the  object, 

4.  local  enough,  so  that  we  can  handle  occlusion, 

5.  robust  enough,  so  that  we  can  handle  noise. 

In  the  following  we  will  use  lower  case  to  describe  vec¬ 
tors  (n,  p. ..),  and  upper  case  to  describe  coordinate 
frames  (N,  0...).  The  basic  feature  for  representing  a 
general  surface  patch  is  the  splash.  The  name  originates 
from  the  famous  picture  of  Professor  Edgerton  (MIT), 
showing  a  milk  drop  falling  into  milk  (see  Figure  1  (a)). 
This  picture  bears  a  resemblance  to  the  normals  in  our 
basic  feature.  A  splash  is  best  described  by  Figure  1  (b). 
At  a  given  location  p  we  determine  the  surface  normal  n. 
We  call  this  normal  the  reference  normal  of  a  splash.  A 
circular  slice  around  n  with  the  surface  radius  p  is  com¬ 
puted.  Starting  at  an  arbitrary  point  on  this  surface 
circle,  a  surface  normal  is  determined  at  every  point  on 
the  circle.  In  practical  we  walk  around  the  reference  nor¬ 
mal  with  a  66  angle  (typically  1°  <  66  <  15°)  and  obtain 
a  set  of  sample  points  on  the  surface  circle.  The  normal 
at  the  angle  6  is  called  n^.  A  super  splash  is  composed  of 
splashes  with  different  surface  radii  px  with  »e{l, . .  .m}, 
where  m  is  the  number  of  splashes  in  a  super  splash. 
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1 


(a)  Milk  Splash 


Figure  1:  Splashes 


We  compute  a  normal  in  our  system  by  approximating 
the  environment  of  a  normal  with  triangles  of  small  sizes. 
Every  triangle  votes  for  a  triangle  normal.  The  average 
of  the  three  closest  triangle  normals  is  the  surface  nor¬ 
mal.  This  is  a  very  rough  method,  but  the  results  were 
always  good  enough  for  our  approach. 

The  frame  N(>  (see  Figure  2  (a))  is  defined  in  the  fol¬ 
lowing  way: 

1.  The  surface  normal  n  is  the  ;  axis. 

2.  At  every  location  of  n g,  the  location  of  the  reference 
normal  p  and  the  tip  of  the  reference  normal  n  4-  p 
describe  a  plane  E.  The  x  axis  is  defined  as  the 
vector  which  is  perpendicular  to  n  and  lies  in  the 
plane  E.  Furthermore  the  angle  between  the  x  axis 
and  a  vector  r  which  is  defined  between  the  origin 
of  Frame  and  the  location  of  n$  has  to  be  in  the 
interval  [-90°, 90°]. 

3.  The  y  axis  is  perpendicular  to  the  x  and  the  z  axis 
in  a  right  handed  coordinate  system. 

This  frame  has  the  property  that  the  xy-plane  always 
approximates  the  tangent  plane  of  the  surface  in  p.  We 
represent  n@  in  spherical  coordinates:  we  compute  the 


Figure  2:  n  and  n# 

two  angles  4>$  and  ipg. 

<j>e  =  angle( n,  n^-0) 
i!>e  =  angle(i\g=0 ,  n$). 

For  every  sample  point  of  a  splash  we  obtain  such  a  tuple. 
Drawing  a  mapping  for  <t>  and  ip  with  respect  to  8  results 
in  two  mappings  as  in  Figure  3(a).  These  two  mappings 


(a)  <t>  and  ip  Mapping  with  (b)  Polygonal  Approxima- 
respect  to  8  tion  of  the  <f>  and  ip  Map¬ 

pings 


Figure  3:  Mapping  Tuple 

have  the  following  properties: 

1.  Dependent  on  where  n0  is,  the  mappings  are  shifted 
along  the  8  axis. 
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2.  The  variation  of  the  curve  represents  the  structural 
change  in  the  surface  environment  around  the  ref¬ 
erence  normal  n 

(a)  For  a  splash  on  a  sphere  or  plane,  the  mappings 
are  constant. 

(b)  A  highly  creased  surface  results  in  a  curved  set 
of  mappings. 

3.  Splashes  which  are  located  close  to  each  other  have 
a  similar  shaped  set  of  mappings.  By  using  the  word 
similar  we  mean  similarity  in  the  sense,  so  that  a  hu¬ 
man  would  classify  them  as  “pretty  much  the  same”. 
That  does  not  automatically  implicate  that  the  pair¬ 
wise  difference  results  in  small  values.  To  be  able  to 
compare  two  mappings,  we  therefore  need  a  metric. 

At  this  point  we  have  reduced  the  original  question  “How 
do  we  capture  the  shape  of  a  general  surface  patch  into  a 
representation?”  which  is  a  three  dimensional  problem, 
into  a  two  dimensional  question  “How  do  we  capture  the 
shape  of  two  mappings  into  a  representation?”. 


Super  Splashes 

Splashes  •  •  •  | 

Linear  Approximations  1 

••  •  | 

i 

Figure  4:  Representation  of  a  Model 
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the  polygons)  are  periodic  (0°  =  360°).  For  the 
purpose  of  robustness  we  use  multiple  line  fitting 
tolerances.  Therefore  we  get  a  set  of  polygons  for 
every  mapping. 

3.  For  every  polygonal  approximation  we  compute  a 
super  segment.  The  start  of  the  super  segment  is 
defined  at  its  global  maximum  value.  If  there  is 
more  than  one  global  maximum  we  use  one  super 
segment  for  each  of  the  maxima.  With  this  super 
segment  choice,  we  obtain  rotational  invariance  in 
our  representation.  By  starting  Jill  super  segments 
at  the  maximum  of  the  approximation,  two  shifted 
polygons  with  the  same  shape  result  in  the  same 
super  segment. 

4.  All  the  obtained  super  segments  are  encoded.  The 
encoding  works  as  described  in  [12].  As  encodable 
attributes  we  take 

(a)  the  angles  between  two  consecutive  segments 
of  a  super  segment  (they  capture  the  curvature 
information) 

(b)  the  mapping  label  4>  or  V’ 

(c)  the  maximum  of  the  mapping  (<j>m ax  or  ipmax) 

(d)  the  surface  radius  of  the  splash. 

Incorporated  in  the  code  of  the  angles  of  the  su¬ 
per  segments  is  also  the  cardinality  (number  of  seg¬ 
ments)  of  the  super  segments  (by  the  number  of 
angles).  That  avoids  matching  super  segments  of 
different  cardinality.  The  encoding  of  <j>max  or  V’ma* 
allows  to  distinguish  between  different  curved  sur¬ 
faces  of  the  same  shape  (e.g.  two  spherical  surfaces 
with  different  sphere  radii).  The  encoding  of  the  ra¬ 
dius  avoids  matches  between  splashes  with  different 
splash  radii. 

5.  All  the  encoded  super  segments  serve  as  keys  into 
a  hash  table  (the  data  base),  where  we  record  the 
corresponding  splashes  as  entries  (see  Figure  4). 

3.2  Matching 

By  using  indexing  for  the  matching  process,  we  only  se¬ 
lect  a  small  set  of  candidate  models  that  are  likely  to  be 
present  in  the  scene.  We  assume  that  most  objects  in 
our  data  base  (hash  table)  are  redundantly  specified  by 
their  splashes.  The  scene  is  preprocessed  as  explained  in 
Section  3.1  to  generate  all  the  splashes  and  their  super 
segments.  The  encoded  super  segments  are  used  to  re¬ 
trieve  the  matching  hypotheses  between  the  splashes  of 
model  and  scene. 


3.1  Object  Representation 

The  solution  is  straightforward  based  on  our  two  dimen¬ 
sional  approach  [12]. 

1.  For  all  splashes  of  a  model  we  compute  the  map¬ 
pings.  In  Section  3.4,  we  talk  about  the  locations  of 
the  splashes. 

2.  For  each  splash  the  two  mappings  are  approximated 
by  polygonal  approximations  (see  Figure  3(b)).  It  is 
important  to  note  that  the  mappings  (and  therefore 


3.3  Verification 

The  verification  stage  is  fully  described  in  [12]  and  con¬ 
sists  of  the  following  steps: 

1.  We  compute  all  possible  matches  for  the  splashes 
of  the  scene  with  the  model  splashes  to  generate 
multiple  hypotheses. 

2.  These  hypotheses  are  stored  with  respect  to  the 
mode!  they  vote  for. 
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3.  The  next  step  is  the  formation  of  consistent  clusters 
based  on  angle  and  distance  constraints.  A  clus¬ 
ter  of  mutually  consistent  hypotheses  represents  an 
instance  of  a  model. 

4.  After  this  grouping  of  hypotheses  into  clusters,  we 
can  compute  the  transformation  from  the  model 
coordinates  to  the  scene  coordinates  by  applying 
a  least  squares  calculation  on  all  the  matching 
splashes.  Because  of  noise,  we  get  in  general  a  good 
first  guess  for  the  transformation  but  not  an  ex¬ 
act  match.  A  second  least  squares  match  on  corre¬ 
sponding  corners  or  segments  can  refine  the  result. 

3.4  Interest  Operator 

One  question  remains  open:  at  which  locations  of  an  ob¬ 
ject  should  we  compute  the  splashes?  The  brute  force 
answer  would  be:  at  every  pixel  (in  a  range  image).  A 
more  sophisticated  answer  would  include  the  observation 
that  we  will  not  get  structurally  rich  splashes  at  every 
point,  which  lead  to  good  and  unambiguous  matches. 
Splashes  in  flat  areas  result  in  super  segments  with  low 
cardinality.  Super  segments  with  low  cardinality  are  less 
descriptive  than  super  segments  with  high  cardinality, 
which  represent  high  structured  surface  patches.  There¬ 
fore  to  obtain  good  and  unique  matches  we  are  interested 
in  matches  of  structured  patches  and  high  cardinality. 
These  can  be  found  at  or  near  points  of  high  curvature. 


Figure  5:  Interest  Operator 

Our  simple  selection  method  works  as  follows  (see  Fig¬ 
ure  5): 

1.  We  compute  the  edges  of  the  artificial  shaded  range 
image  (by  assuming  a  light  source  at  the  viewer  and 
computing  a  gray  value  for  every  pixel  in  the  range 
image  under  the  assumption  of  Lambertian  condi¬ 
tions)  with  the  Canny  edge  detector  [3].  We  want 
to  position  the  splashes  in  areas  where  we  can  ex¬ 
pect  structured  patches  on  one  object.  This  prop¬ 
erty  is  not  given  on  the  boundary.  A  boundary 
edge  typically  has  the  object  as  one  neighborhood 
and  other  objects  or  background  information  as  the 
other  neighborhood.  Therefore  we  use  only  the  “in¬ 
ner  object  edges”  and  throw  away  the  boundary 
edges. 

2.  For  positioning  the  splashes  we  are  interested  in  ar¬ 
eas  around  the  edges.  Placing  a  splash  on  a  high 
curvature  point  has  the  disadvantage  of  an  unreli¬ 
able  reference  normal.  A  reliable  reference  normal 
is  important  for  a  stable  splash.  Nevertheless  we 


want  to  capture  the  structure  of  the  edges  in  the 
splash.  Therefore  the  best  place  for  a  splash  is  in 
the  neighborhood  of  an  edge.  We  get  this  area  in  3 
steps: 

(a)  We  dilate  the  edge  image  by  replacing  every 

pixel  on  the  edges  by  a  disc  of  a  certain  radius 
(e.g.  =8  pixels).  The  resulting  image  is 

called  dilatation  1. 

(b)  We  dilate  the  edge  image  with  another  radius 
(e.g.  r2  =3  pixels  with  rx  >  r2).  The  resulting 
image  is  called  dilatation  2. 

(c)  The  subtraction  of  dilatation  1  and  dilata¬ 
tion  2  gives  us  a  mask.  This  mask  describes 
an  area  with  the  above  described  characteris¬ 
tics.  Points  in  this  mask  are  no  high  curvature 
points,  but  they  are  close  to  edges. 

3.  We  compute  a  grid  of  splashes  on  the  range  image 
with  respect  to  this  mask. 

As  we  will  see  in  Section  4,  this  simple  method  works 
pretty  well. 

3.5  Complexity  Analysis 

In  [12]  we  show  that  for  the  two  dimensional  case,  under 
the  assumptions  that  every  model  has  the  same  num¬ 
ber  of  super  segments  and  that  the  entries  are  equally 
distributed  over  the  hash  table,  the  overall  cost  is 

Orecogniie  =  Omatch  +  OveTiJy  —  0(qj  +  O(M)  =  O(M), 

where  M  is  the  number  of  models  in  the  data  base  and  q 
the  number  of  super  segments  in  the  scene.  We  assume 
q  constant  to  study  the  behavior  of  a  large  data  base 
(large  M).  We  show  further  that  the  slope  of  the  linear 
cost  function  is  dependent  on  the  occurrence  of  a  model 
in  the  scene  or  its  absence.  In  our  results  for  the  two 
dimensional  case  the  cost  for  detecting  the  absence  of  a 
model  is  less  than  10%  of  the  cost  for  the  detection  of 
the  occurrence.  In  the  three  dimensional  case,  we  map 
each  splash  onto  a  constant  number  of  super  segments. 
Therefore  we  claim  that  this  result  is  also  valid  for  the 
three  dimensional  case.  To  support  this  claim,  we  cre¬ 
ated  a  data  base  of  100  objects.  Every  object  consists 
of  a  random  range  image.  The  scene  is  a  composition 
of  four  of  these  objects  including  translation,  rotation, 
and  occlusion.  In  our  results  the  cost  for  detecting  the 
absence  of  a  model  is  less  than  50%>  of  the  cost  for  the 
detection  of  the  occurrence.  We  believe  that  the  cause 
for  the  higher  relative  cost  of  absence  detection  in  three 
dimensions  compared  to  two  dimensions  lies  in  the  fact 
that  splashes  for  surfaces  are  less  descriptive  than  super 
segments  for  boundaries. 

4  Results 

With  the  current  implementation  we  are  able  to  show 
that  the  proposed  recognition  mechanism  of  recognizing 
general  three  dimensional  objects  works.  We  choose  two 
different  scenes: 

1.  A  Mozart  bust,  which  is  highly  curved,  and  which 
partially  has  a  structured  surface.  Because  of  lack 
of  data  we  cannot  deal  with  a  real  three  dimensional 
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model  and  a  scene  which  consists  of  range  data. 
Therefore  we  take  the  original  data  of  the  Mozart 
bust  as  model,  rotate  the  range  data  synthetically 
to  obtain  the  scene.  We  rotate  pixel  by  pixel  and  fill 
the  holes  by  averaging  the  values  of  neighbor  pixels. 
This  rotation  process  is  guaranteed  to  add  a  lot  of 
noise! 

2.  A  scene  composed  of  a  plane  and  a  wagon,  which 
shows  that  our  method  works  for  objects  which  can 
be  approximated  by  polygonal  surfaces.  We  have 
four  range  images,  two  of  the  plane  from  different 
views  and  two  of  the  wagon  from  different  views. 
One  wagon  and  one  plane  image  serve  as  models. 
The  scene  is  composed  synthetically  by  combining 
the  other  two  range  images  into  the  scene  image. 

4.1  Mozart 

Our  input  data  is  the  range  image.  For  better  visibil¬ 
ity  we  show  the  artificially  shaded  images.  Figure  6(a) 
shows  the  model  of  the  Mozart  bust,  Figure  6(d)  shows 
the  scene,  which  is  the  model  rotated  by  20  degrees 
around  a  tilted  axis.  The  inner  edges  are  shown  in  Fig¬ 
ure  6(b)  and  (e).  The  results  of  our  interest  operator  are 
the  masks  shown  in  Figure  6(c)  and  (f).  The  recognition 
result  is  shown  in  Figure  7.  (We  overlayed  a  grid  of  the 
range  image  of  the  model,  transformed  by  the  resulting 
transformation  on  top  of  the  Figure  6(d).) 

4.2  Plane  and  Wagon 

Figure  8(a)  shows  the  shaded  range  image  of  the  plane 
and  Figure  8(b)  shows  the  shaded  range  image  of  the 
wagon.  Figure  9  shows  the  best  detected  solution.  It  is 
interesting  to  note  that  the  plane  has  no  highly  struc¬ 
tured  areas  on  the  wings.  Therefore  we  get  no  matches 
on  the  wings.  This  leads  to  poor  correspondence  for  the 
wings  (we  have  one  degree  of  freedom  along  the  axis  of 
the  plane). 


object 

(size  in  pixels) 

#  super 
splashes 

radii 

(pixels) 

tolerances 

(degrees) 

grid 

(pixels) 

mozart 

(264x399) 

402 

20,30,40 

15,20,25,30 

6 

scene  1 
(240x390) 

329 

20,30,40 

15,20,25,30 

8 

plane 

(507x218) 

60 

20,30,40 

15,20,25,30 

8 

wagon 

(422x178) 

367 

20,30,40 

15,20,25,30 

6 

scene  2 
(458x324) 

293 

20,30,40 

15,20,25,30 

8 

Table  1:  Table  of  Objects 

We  can  give  some  rough  numbers  about  the  time  com¬ 
plexity  (on  a  serial  Symbolics  machine). 

1.  The  acquisition  of  one  super  splash  (consisting  of 
typically  3  splashes  with  4  line  fitting  tolerances) 
takes  about  7  seconds. 

2.  The  Mozart  bust  consists  of  about  400  super 
splashes,  therefore  it  took  about  45  minutes  to  com¬ 
pute  all  splashes. 


3.  The  plane  consists  of  about  60  splashes,  therefore  it 
took  about  7  minutes  to  compute  all  splashes. 

4.  The  matching  process  is  always  below  20  seconds. 

5.  The  verification  process  takes  less  than  3  minutes. 

The  items  1  to  3  are  processed  offline  to  build  the  data 
base.  The  recognition  process  itself  consists  of  item  4  and 
5.  All  these  numbers  reflect  neither  the  high  parallelism 
which  is  theoretically  possible  nor  the  data  redundancy 
with  which  we  work  at  the  moment.  Simple  improve¬ 
ments  can  significantly  increase  the  performance.  This 
is  the  goal  of  our  future  studies. 

5  Conclusion  and  Future  Research 

The  results  with  our  current  implementation  of  the 
TOSS  system  described  in  the  last  sections  show  that 
the  idea  of  describing  the  surface  of  an  object  based  on 
splashes  is  powerful  enough  to  handle  complex  three  di¬ 
mensional  shapes.  Our  future  research  will  extend  vari¬ 
ous  aspects  of  this  mechanism.  Our  system  is  designed 
for  the  recognition  of  surface  patches.  Several  other 
pieces  of  information  are  not  used  to  enhance  the  perfor¬ 
mance.  One  is  for  example  the  boundary  of  an  object. 
There  might  be  a  possibility  of  including  the  different 
boundaries  derived  from  different  views  of  the  object  by 
including  two  dimensional  super  segment  recognition  in 
the  three  dimensional  recognition  process.  We  have  to 
study  this  possibility.  Our  long  term  goal  is  to  build  a 
recognition  system  which  is  able  to  recognize  three  di¬ 
mensional  models  in  a  two  dimensional  gray  level  image. 
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(f)  Scene  1:  In¬ 
terest  Mask 


Figure  7:  Scene  1:  Result  of  Recognition 


(a)  Plane:  Shaded  Range  (b)  Wagon:  Shaded  Range 
Image  Image 


(c)  Scene  2:  Shaded  Range  Image 


Figure  8:  Plane  and  Wagon  Scene 


Figure  6:  Mozart  Scene 


Figure  9:  Scene  2:  Result  of  Recognition 
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Abstract 

We  propose  a  technique  based  on  analysis  of  symmetries  in  image  to  infer 
the  3-D  shape  of  surfaces  of  objects  in  it.  This  technique  is  anal}  zed  aiid  applied  to 
Zero  Gaussian  Curvature  surfaces  in  detail.  Method  consists  of  deriving  a  number 
of  constraints  based  on  a  few  simple  assumptions.  Combination  of  constraints  to 
give  unique  (or  few)  solutions  is  discussed.  Experimental  results  on  selected  scenes 
are  given  and  are  in  good  conformity  with  human  perception.  The  techniques  are 
based  on  concepts  presented  in  an  earlier  paper  and  fill  in  important  gaps  in  the 
earlier  theory. 

1  Introduction 

Inferring  shape  of  the  surfaces  in  a  scene  from  a  single  line  drawing  is  an  important  and 
difficult  problem  in  computer  vision.  The  early  work  on  inferring  3-D  structure  from  a 
2-D  shape  was  focused  on  analysis  of  line  drawings  of  polyhedra  [Huf71,  Clo71,  Mac73, 
Kan81,  Sug86).  There  have  also  been  some  attempts  at  developing  techniques  for  curved 
surfaces  such  as  (BT81,  Ste81,  XT87,  HB88].  Due  to  limited  space,  we  will  omit  a  critical 
analysis  of  previous  methods.  Mostly,  they  are  not  applicable  to  the  types  of  scenes  we 
analyze,  except  perhaps  the  paper  by  Horaud  and  Brady  [HB88]. 

We  propose  a  technique  based  on  the  analysis  of  symmetries  in  a  scene.  Our  basic 
concepts  were  first  presented  in  [UN88].  In  that  paper,  we  described  some  basic  con¬ 
straints  that  derive  from  observation  of  symmetry  and  other  properties  of  line  drawings. 
At  that  time,  we  believed  that  the  constraints  were  sufficient  to  infer  unique  surface 
orientations  for  a  certain  class  of  objects.  However,  it  turns  out  that  even  though  the 
number  of  constraint  equations  can  exceed  the  number  of  unknowns,  a  unique  solution  is 
not  guaranteed.  This  paper  is  about  how  additional  information  can  be  utilized  to  infer 
unique  (or  a  small  set  of)  surface  shapes.  In  the  process,  we  have  also  generalized  the 
constraints  in  several  ways.  Our  method  has  been  validated  by  comparison  with  human 
perception. 

Throughout  the  paper  we  will  assume  orthographic  projection,  with  the  image  plane 
being  the  x  —  y  plane.  Therefore  a  point  (x,y}  z)  in  3-D  projects  as  the  point  (x,y)  on 
the  image  plane. 

In  section  2,  we  define  two  types  of  symmetries  that  we  find  useful  and  the  quali¬ 
tative  shape  inferences  we  can  draw  from  them.  Section  3  contains  our  approach  to 

'This  research  was  supported  by  the  Defense  Advanced  Research  Projects  Agency  under  contract 
number  F  33615-87-C-1436  monitored  by  the  Air  Force  Wright  Aeronautical  Laboratories,  Darpa  Order 
No.  3119. 
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quantitative  shape  inference.  We  focus  particularly  on  the  analysis  of  objects  containing 
“Zero-Gaussian  Curvature”  (ZGC)  surfaces.  We  also  give  experimental  results  from  our 
implementation  on  some  selected  objects. 

2  Qualitative  Shape  Inference 

We  believe  that  symmetries  have  an  important  role  in  shape  perception,  this  also  has  been 
noted  and  used  by  many  researchers  [NB77,  Nal87,  Rao88,  Kan81,  Ste8 1] .  We  define  two 
types  of  symmetries,  that  we  call  parallel  symmetry  and  mirror  symmetry ,  and  discuss 
how  they  can  be  used  to  infer  surface  shape.  For  curves  to  be  symmetric  (parallel  or 
mirror)  certain  point-wise  correspondences  between  two  curves  must  exist.  We  will  call 
the  lines  joining  the  corresponding  points  on  the  curves  as  the  lines  of  symmetry ,  the 
locus  of  the  mid  points  of  these  lines  as  the  axis  of  symmetry ,  and  the  curves  forming 
the  symmetry  as  the  curves  of  symmetry. 

Parallel  Symmetry  Let  X{(s)  =  (i;(s),yj(s)),  for  i  =  1,2,  be  two  curves  parameter¬ 
ized  by  arc  length  s. 

Let  Oi(s)  =  arctan ((dyi(s)/ds)/(dxi(s)/ds)).  Then,  Xi(s)  and  X2(.s)  are  said  to  be  par¬ 
allel  symmetric  if  there  exists  a  point-wise  correspondence  /(s)  between  them  such  that, 
0j(.s)  =  02(f(s))  for  all  values  of  s  for  which  X2  and  X2  are  defined  and  f(s)  is  a  con¬ 
tinuous  monotonic  function.  Note  that  computing  symmetry  between  two  curves  using 
this  definition  requires  estimating  the  function  f(s)  as  well.  A  useful  special  case  is  when 
f(s)  is  restricted  to  be  a  linear  function. 

Mirror  Symmetry  For  mirror  symmetry,  the  point-wise  correspondence  should  be 
such  that  the  axis  of  the  symmetry  is  straight,  and  the  lines  of  symmetry  are  at  a  constant 
angle  (not  necessarily  orthogonal)  to  the  axis  of  symmetry.  This  definition  of  the  mirror 
symmetry  is  similar  to  that  of  skew  symmetry.  We  use  the  term  mirror  symmetry  in 
the  context  of  curved  surfaces  as  skew  symmetry  has  historically  been  used  for  planar 
surfaces  only. 

We  now  describe  some  qualitative  inferences  about  the  shape  of  surfaces  from  their 
symmetries.  Our  inferences  are  based  on  the  assumption  of  general  viewpoint  defined  as: 
Definition  1  General  Viewpoint  :  A  scene  is  said  to  be  imaged  from  a  general  view¬ 
point,  if  perceptual  properties  of  the  image  are  preserved  under  slight  variations  of  the 
viewing  direction. 

Specifically,  the  properties  we  are  interested  in  are:  straightness  and  parallelity  of  lines 
and  symmetry  of  curves. 

It  will  be  useful  to  consider  figures  as  belonging  to  one  of  the  following  three  classes: 

Case  I:  Here,  one  symmetry  covers  the  entire  boundary  of  the  surface  (though  more 
than  one  such  description  may  be  possible).  Figure  1  shows  two  examples.  It  can  be 
shown  that  this  symmetry  must  be  a  mirror  symmetry.  We  now  show  that  such  surfaces 
must  be  planar  under  general  viewpoint  assumption. 

Theorem  1  If  a  surface,  bounded  by  real  edges  (i.e. edges  that  do  not  change  with  the 
viewpoint),  produces  a  line  drawing  in  the  image  plane  which  belongs  to  case  I,  then  the 
surface  must  be  planar  (under  the  assumption  of  general  viewpoint). 

Proof:  The  assumption  of  general  viewpoint  implies  that  parallel  lines  in  the  image 
plane  must  be  the  projection  of  parallel  3-D  lines,  otherwise  they  would  not  project 
parallel  from  another  viewpoint.  Therefore  we  conclude  that  the  3-D  lines,  say  that 
project  as  the  lines  of  mirror  symmetry  on  the  image  plane,  must  be  parallel  to  each 
other  in  3-D,  because  lines  of  mirror  symmetry  are  parallel  to  each  other  in  the  image 
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(a)  (b) 

Figure  1:  Two  examples  of  case  I. 

plane.  The  axis  of  symmetry  in  3-D,  which  can  be  obtained  by  joining  the  midpoints 
of  the  3-D  lines  has  to  be  straight  because  its  projection  on  the  image  plane,  which 
is  the  axis  of  mirror  symmetry  is  straight  (by  definition).  Therefore,  the  lines  /;  have  to 
lie  on  a  plane,  because  they  are  parallel  to  each  other  and  a  single  line,  the  3-D  axis  of 
symmetry,  intersects  them.  Hence  the  3-D  surface,  which  contains  the  lines  is  planar. 
□ 

Case  II:  Here,  the  boundary  of  the  figure  is  covered  by  exactly  two  symmetries,  and 
at  least  one  of  which  must  be  a  parallel  symmetry.  We  will  argue  that  case  II  figures 
are  the  ones  that  give  us  the  most  information  about  the  surface  shape  and  that  such 
cases  are  common  in  scenes  of  everyday  experience.  Figure  2  shows  some  examples  of 
this  case.  The  type  of  surface  we  perceive  depends  on  the  properties  of  the  symmetries. 
We  consider  the  case  of  Zero  Gaussian  Curvature  (ZGC)  surfaces  first. 

Lemma  1  Parallel  symmetric  curves  in  the  image  plane  must  be  projections  of  parallel 
symmetric  curves  in  3-D  if  imaged  from  a  general  viewpoint. 

Proof:  This  is  an  extension  of  the  result  that  parallel  lines  on  the  image  plane  must 
be  projection  of  parallel  3-D  lines  from  a  general  viewpoint.  Two  curves  whose  tangents 
are  parallel  continuously  on  the  image  plane  must  be  projection  of  two  3-D  curves,  whose 
tangents  are  parallel  at  the  same  points  as  in  the  image  plane,  from  a  general  viewpoint. 
Theorem  2  If  a  surface  generates  one  parallel  symmetry  and  one  mirror  symmetry,  with 
straight  curves  of  mirror  symmetry,  on  the  image  plane,  and  the  straight  curves  of  mirror 
symmetry  are  also  the  lines  of  symmetry  for  parallel  symmetry,  then  the  surface  must 
be  a  Zero  Gaussian  Curvature  (ZGC)  surface  (assuming  general  viewpoint  and  assuming 
that  the  surface  does  not  have  any  variations  or  fluctuations  that  do  not  produce  any 
edges  in  the  image  plane). 

Proof:  Using  lemma  1  we  conclude  that  the  3-D  curves  producing  parallel  symmetry 
on  the  image  plane  must  be  parallel  symmetric.  Since  the  mirror  symmetry  curves  are 
straight  on  the  image  plane,  the  3-D  corresponding  curves  must  also  be  straight.  That  is 
the  surface  embeds  straight  lines.  Also  the  lines  that  join  corresponding  points  on  the  3-D 
parallel  symmetry  curves  must  be  on  the  3-D  surface  (rulings  of  the  surface)  otherwise  the 
surface  would  not  produce  the  same  set  of  symmetries  from  another  viewpoint.  Consider 
the  normals  of  the  points  on  the  surface  a  ruling  intersect  the  3-D  parallel  symmetry 
curves,  that  is  the  normals  N\  and  N2  in  the  figure  3.  Since  the  tangents  and  t2  are 
the  same  and  of  course  the  tangent  of  the  ruling  is  constant  along  it,  the  surface  normals 
Ni  and  IV2,  which  are  the  cross  products  of  t\  and  f2  with  the  tangent  of  the  ruling,  must 
be  the  same.  Therefore  the  tangent  of  the  surface  is  constant  along  the  ruling  from  the 
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Figure  2:  Examples  of  case  II  surfaces. 


Figure  3:  A  ZGC  surface  with  a  ruling  on  it. 

assumption  that  the  surface  does  not  have  any  variation  that  does  not  produce  any  edge. 
Hence  the  Gaussian  curvature  of  the  surface  is  zero.  □ 

It  follows  that  if  the  parallel  symmetry  has  a  linear  correspondence  function  then 
the  surface  is  conic,  and  if  the  correspondence  function  is  an  identity  then  the  surface 
is  cylindrical.  A  quantitative  analysis  for  ZGC  surfaces  is  presented  in  the  following 
sections. 

If  the  curves  of  mirror  symmetry  are  not  straight  or  have  two  curved  parallel  symmetries 
then  we  perceive  a  doubly  curved  surface  (i.e.both  of  the  principal  curvatures  are  non 
zero).  The  rightmost  surface  in  figure  2  is  an  example  of  such  a  doubly  curved  surface. 

Case  III:  This  class  includes  all  remaining  cases.  We  believe  that  in  such  cases,  surface 
shape  inferences  can  not  be  made  directly  from  the  given  boundaries.  Either  no  distinct 
shape  is  perceived,  or  shape  inference  assumes  the  existence  of  some  boundaries  that 
have  been  omitted.  Figure  4  shows  an  example.  We  will  not  further  consider  such  figures 
in  this  paper. 

3  Quantitative  Shape  Recovery 

We  now  discuss  quantitative  shape  recovery  of  zero-Gaussian  curvature  surfaces.  A  Zero 
Gaussian  Curvature  (ZGC)  surface  is  one  where  the  the  Gaussian  curvature  (the  product 
of  the  maximum  and  minimum  curvatures)  of  the  surface  is  zero  everywhere.  Cylinders 
and  cones  are  examples  of  a  ZGC  surface.  Our  analysis  uses  the  following  two  theorems 
that  apply  to  ZGC  surfaces: 

Theorem  3  Curves  obtained  by  intersecting  a  ZGC  surface  with  two  parallel  planes, 
called  the  cross  section  plane,  are  parallel  symmetric  and  the  lines  of  symmetry  are  the 
rulings  of  the  surface. 


Figure  4:  (a)  A  figure  with  two  mirror  symmetries,  (b)  addition  of  an  extra  curve  clarifies 
the  perceived  shape 


(*)  <b>  <c) 

Figure  5:  An  example  Zero  Gaussian  Curvature  surface;  (a)  the  boundaries  only,  (b)  with  the 
rulings,  (c)  and  with  axis  of  symmetry 

Proof  of  this  theorem  is  given  in  appendix  A.l.  Note  that  this  theorem  does  not 
guarantee  that  parallel  symmetry  curves  are  necessarily  planar.  However,  they  do  not 
occur  by  accident.  For  example,  to  obtain  parallel  symmetric  curves  from  a  conic  surface, 
by  cutting  with  non  planar  cross  sections  the  cuts  must  be  translated  along  the  axis  of  the 
cone  and  scaled  exactly  with  the  scaling  function  of  the  cone.  Planarity  of  cross  sections 
may  be  confirmed  by  the  mirror  symmetry  of  the  cross  section  or  a  segmentation  into 
planar  sections  may  be  indicated  by  mirror  symmetry.  For  example  the  cross  section  of 
the  object  in  figure  6  (a)  has  a  single  mirror  symmetry  and  is  perceived  planar,  whereas 
the  cross  section  of  the  object  in  figure  6  (b)  has  two  mirror  symmetries  and  the  perception 
is  that  the  cross  section  has  two  planar  parts. 

We  can  infer  the  rulings  of  the  surface  by  joining  the  corresponding  points  on  the 
two  curves  forming  the  parallel  symmetry  by  straight  lines,  as  shown  in  figure  5  (b)  (the 
corresponding  points  on  the  two  curves  have  the  same  tangent).  Note  that  the  orientation 
of  a  ZGC  surface  does  not  change  along  a  ruling  (this  is  also  proved  as  a  byproduct  of 
the  above  proofs  in  the  appendix).  Therefore,  if  we  find  the  orientation  of  the  surface 
at  a  single  point  on  a  ruling  we  can  extend  it  along  the  ruling.  We  now  present  the 
constraints  for  finding  the  surface  orientations  at  these  points. 

3.1  Constraints 

We  now  give  some  constraints  that  derive  from  observations  of  the  symmetries  and  other 
boundaries  in  the  image.  We  formulate  three  constraints  discussed  in  sub-sections  below 
and  then  discuss  how  to  combine  them. 
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(a)  (b) 

Figure  6:  Objects  with  cross  sections  having  (a)  only  one  mirror  symmetry,  (b)  two 
mirror  symmetries 

3.1.1  Curved  Shared  Boundary  Constraint  (CSBC) 

This  constraint  relates  the  orientations  of  the  two  surfaces  on  opposite  sides  of  an  edge. 
The  planar  version  has  been  used  since  early  days  in  polyhedral  scene  analysis  [Mac73], 
though  the  term  shared  boundary  constraint  is  ours. 

Consider  two  surfaces  X\(u,v)  and  X2 (u,v)  meeting  at  a  curve  T(s)  =  (i(s),t/(s),  z(s)l 
as  in  figure  7.  Since  the  curve  r(s)  is  generated  by  intersection  of  surfaces  X\  and  X2,  T(s ) 
is  a  curve  on  both  X\  and  AV  Therefore  the  tangent  vector  r'(s)  =  (x'(,s),y'(.s),z'(s)) 
of  T(s)  is  a  vector  both  on  the  tangent  plane  of  Aj  and  X2  along  the  curve  T. 

Say  N^UjV)  and  N2(u,v)  are  the  normals  of  X\  and  X2  respectively.  Along  the  curve 
T(s)  we  can  represent  the  normals  N x  and  N2  as  Ni(s)  =  Ni(ui(s),  v;(s)).  Since  T'(s)  is 
on  the  tangent  planes  of  both  Xi  and  X2 ,  r'(-s)  is  orthogonal  to  both  Ni(s)  and  N2(s). 
That  is 

N1{s).T'(s)  =  0  N2(s).T,(s)  =  0  (1) 

We  can  rewrite  it  as  T'(.s)  •  (N2(s)  —  Ni(s))  =  0. 

Say  the  normals  Af,(s)  are  represented  in  in  p  —  q  space  as  Ni(s)  =  (p{(s),qi(s),  1). 
Substituting  these  in  the  above  equation  gives: 

(x'(s),y'(s),z'(s))  •  ((p2(s),q2(s),l)  -  (7h(s),9i(s),l))  =  0 

x'(s)(p2(s)  -  pi(«))  +  y'(s)(?2(s)  -  9i(s))  =  0  (2) 

This  is  the  Curved  Shared  Boundary  Constraint  (CSBC)  which  states  that  along  the 
curve  r(s)  the  orientation  of  the  surfaces  Xi  and  X2  are  constrained  by  the  tangent, 
(i'(s),y'(s))  of  the  image  of  the  curve  T(s)  under  orthographic  projection.  This  constraint 
has  also  been  derived  previously  by  Shafer  et  al  [STK83]. 

A  stronger  constraint  can  be  obtained  if  we  can  assume  that  the  intersection  curve, 
T,  is  planar.  Say,  T  lies  in  a  plane  with  orientation  (pc,qc).  With  the  assumption  of 
planarity  the  constraint  equation  becomes: 

z'(5)(Pc  -  P(*))  +  y'(s)(qc  -  q{s))  =  0  (3) 

We  will  apply  this  constraint  to  one  of  the  curves  producing  parallel  symmetry  for  a 
ZGC  surface. 

3.1.2  Inner  Surface  Constraint  (ISC) 

The  inner  surface  constraint  restricts  the  relative  orientations  of  the  neighboring  points, 
within  a  surface.  For  ZGC  surfaces  the  image  of  the  rulings  of  the  surface  are  used  to 
constrain  the  surface  orientation  of  neighboring  points. 
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Figure  7:  Two  curved  surfaces  meeting  at  a  curve  T 


Figure  8:  The  inner  surface  constraint. 

Let  X(u,v)  =  (x(u,v),y(it,w),z(u,v))  be  a  ( u,v )  parametric  representation  of  the 
surface  X,  and  let  v  be  along  the  direction  of  minimum  curvature  (rulings  for  ZGC 
surfaces).  We  can  form  an  orientation  function  in  terms  of  the  parameters  u  and  u; 
0(u,v)  =  (p[u,v),q( «,«)).  Inner  Surface  Constraint  (ISC)  states  that  for  a  constant 
value  of  the  parameter  v ,  say  v0,  as  the  parameter  u  changes  the  direction  of  the  function 
0  in  the  p  —  q  plane,  0U  =  (pu,qu),  should  be  orthogonal  to  the  direction  of  the  image 
of  the  tangent  of  the  rulings,  that  is  the  lines  of  symmetry  (xv,yv),  (under  orthographic 
projection).  That  is; 


(?u,9u)  •  (xv,yv)  =  0 

PuZv  +  quyv  =  0 

Pi!  Vv 


(4) 


The  proof  of  this  property  is  given  in  appendix  A. 2.  Geometrically  ISC  can  be  described 
as  follows:  as  we  move  along  the  axis  of  parallel  symmetry  (the  u  parameter  curve)  the 
surface  orientation  should  move  in  the  p  —  q  plane  in  a  direction  orthogonal  to  the  image 
of  the  rulings  (the  lines  of  parallel  symmetry).  For  cylindrical  surfaces,  for  example,  this 
ISC  curve  is  a  straight  line,  since  all  rulings  are  parallel  to  each  other.  Note  that  this 
constraint  does  not  require  any  regularity  assumptions  about  the  contour. 

The  above  equation  expresses  the  inner  surface  constraint  in  a  continuous  domain.  In 
digital  domain,  suppose  the  surface  orientation  is  to  be  computed  at  n  points  for  a  ZGC 
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Figure  9:  The  three  degrees  of  freedom  present,  pc,qc,d ,  in  a  ZGC  surface  after  applying  the 
constraints  ISC  and  CSBC. 

surface  (these  n  points  are  along  the  axis  of  the  parallel  symmetry,  since  the  surface 
orientation  for  a  ZGC  does  not  change  along  the  rulings).  We  have  2 n  unknowns,  (p,, <&) 
for  n  point.  This  constraint  provides  us  with  n  —  1  constraint  equations  as  shown  below. 

Let  the  image  of  the  ruling  r;  between  the  ith  and  i  +  Ist  points  make  an  angle  7;  with 
the  horizontal  as  in  figure  8.  The  constraint  equation  relates  the  change  in  orientation 
along  the  axis  of  symmetry,  (p„>9u)>  to  the  tangent  of  the  ruling,  (xv,j/„).  Here  the 
tangent  of  the  ruling  is  =  (cos(7j),sin(7;))  and  the  derivatives  (pu,qu)  can  be 

approximated  by  first  order  difference  as  (pu,qu)  =  (p;+i  —  p,,  <?,+ 1  —  <fc).  Substituting 
these  in  equation  4  gives 


(Pi+i  -  Pi)  °os(7 ,•)  +  (qi+ 1  -  qi)  sin^)  =  0  (5) 

3-1.3  Combination  of  ISC  and  CSBC 

In  digital  domain  we  need  to  quantize  (p(s),q(.s))  as  (pi,q,)  and  estimate  (i'(s),  y(s)) 
from  the  image  of  r(s),  which  is  ( x(s),y(s ))  under  orthographic  projection.  If  the  ZGC 
surface  is  to  be  described  at  n  points  then  there  are  2n  +  2  unknowns,  2n  for  the  surface 
orientations  (pi,qi)  and  2  for  the  cross  section  plane  (pc,qc).  This  constraint  provides  us 
with  n  constraint  equations.  By  using  the  curved  shared  boundary  constraint  (CSBC) 
in  conjunction  with  the  inner  surface  constraint  (ISC),  we  get  2n  —  1  equations.  This 
leaves  us  with  3  degrees  of  freedom  for  describing  a  ZCC  surface  totally. 

The  two  constraints  are  shown  graphically  in  figure  9.  A  ZGC  surface  (a  frustrum)  is 
shown  in  (a)  with  rulings  and  the  axis  of  the  symmetry  marked  on  the  surface.  The  inner 
surface  constraint  (ISC)  curve  is  shown  on  the  p  —  q  plane.  Here  the  section  of  the  ISC 
curve  from  the  point  ( Pi,qi )  to  (p;+i,q;+1)  is  orthogonal  to  the  ruling  The  straight 
lines  on  the  p  —  q  plane  are  the  curved  shared  boundary  constraints  (CSBC)  such  that 
at  each  point  i  the  tangent  of  the  axis  of  symmetry  (the  dotted  curve  on  the  surface) 
is  orthogonal  to  the  corresponding  CSBC  line  on  the  p  —  q  plane.  Three  parameters 
required  to  fix  all  the  orientations  (p;,<ft)  are:  the  orientation  of  the  plane  containing  the 
intersection  curve,  (pc,qc)i  and  the  quantity  shown  as  d.  in  figure  9  which  we  call  angle 
parameter.  The  angle  parameter  can  be  described  as  distance  of  the  ISC  curve  from  the 
point  (pc,9c)  or  the  angle  between  the  cross  section  plane  and  the  surface  analyzed  at 
the  point  d  is  measured  (for  the  case  of  this  figure  the  angle  between  the  cross  section 
surface  and  the  middle  ruling  for  figure  9).  Specifying  the  length  of  one  of  the  CSBC 
lines  is  enough  to  fix  the  angle  parameter,  d. 
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Figure  10:  Two  cylinders  (a)  is  cut  along  the  curves  of  maximal  curvature,  and  (b)  is  cut  in 
an  arbitrary  direction  while  preserving  parallel  symmetry,  now  we  have  the  perception  of  an 
elliptical  cylinder. 

3.1.4  Orthogonality  Constraint  (OC) 

We  will  assume  orthogonality  between  the  axis  of  parallel  symmetry  and  the  lines  of 
parallel  symmetry.  This  is  equivalent  to  slicing  the  surface  along  rulings  to  obtain  thin 
skew  symmetric  planar  strips  and  assuming  that  these  strips  are  orthogonally  symmetric 
in  3-D,  as  in  Kanade’s  analysis  for  polyhedra  [Kan81].  This  preference  is  illustrated  in  10 
where  in  (a)  we  see  a  circular  cylinder,  but  in  (b)  we  prefer  to  see  an  orthogonal  elliptic 
cylinder  rather  than  a  slanted  cylinder. 

For  a  ZGC  surface,  say  the  tangent  of  the  axis  of  symmetry  makes  an  angle  a  with 
the  horizontal  and  the  ruling  makes  an  angle  /3  at  some  point  on  the  surface,  as  in  figure 
11.  Let  the  normal  of  the  surface  be  N  =  (p,q,  1)  at  that  point.  Since  the  3-D  tangent 
vectors  A  and  B  are  on  the  tangent  plane  of  the  surface  they  can  be  represented  as: 

A  =  (cos(a),sin(a),pcos(a)  -f  qsin(a)) 

B  =  (cos(/?),sin(/3),pcos(/?)  +  qsin(/?))  (6) 

and  from  the  orthogonality  of  the  3-D  vectors  A  and  B  we  get:  A  •  B  =  0  or 

cos(a  —  (3)  +  (pcosa  +  <7  sin  a)(p  cos /?  +  qsin/3)  =  0  (7) 

This  is  the  equation  of  a  hyperbola  in  the  p  —  q  space,  constraining  possible  orientations 
for  the  surface  normal  N.  In  digital  domain  we  need  to  digitize  a,  /?,  p,  and  q  above 
as  ai,/3{,pi,qi  for  each  point  on  the  axis  symmetry.  This  constraint  provides  us  with  n 
equations  if  the  surface  orientation  is  to  be  computed  at  n  points.  In  conjunction  with 
the  previous  constraints  we  have  2n  +  2  unknowns  and  3n  —  1  equations.  Therefore  we 
now  have  an  overconstrained  case  for  n  >  3.  However,  not  all  of  these  equations  are 
always  independent. 

3.2  Combining  the  Constraints 

The  three  different  constraints  of  the  previous  sections  provide  3n— 1  constraint  equations, 
for  n  points  producing  2n+2  unknowns  (including  (pc,  qc)).  This  suggests  that  the  system 
of  equations  is  over  constrained.  In  other  words  given  a  general  ZGC  surface  contours, 
it  may  not  be  possible  to  find  an  interpretation  for  the  contours  such  that  the  surface 
obeys  all  the  given  constraints  exactly.  However  for  special  but  important  cases,  these 
set  of  constraints  are  dependent  and  may  give  a  unique  answer  or  even  leave  an  extra 
degree  of  freedom.  Also,  even  when  there  are  no  dependencies  between  the  constraints, 
there  may  still  not  be  a  unique  solution. 


B 


Figure  11:  Orthogonality  constraint 

Cylindrical  Surfaces:  In  previous  work  [UN88],  we  have  shown  that  for  a  cylindrical 
surface,  the  constraints  leave  one  degree  of  freedom  undefined  leaving  the  plane  containing 
the  parallel  symmetry  curve  to  be  constrained  to  be  on  a  line  in  the  p  —  q  space  which 
passes  through  the  origin  and  is  in  the  direction  of  the  rulings. 

Circular  Cones:  It  can  be  shown  that  these  constraints  give  a  unique  solution  for  the 
case  of  circular  cone.  We  omit  a  detailed  analysis  for  lack  of  space. 

General  ZGC  Surfaces:  As  noted  in  section  3.2  for  surfaces  other  than  cylindrical 
surfaces  and  the  circular  cone,  the  three  constraints  can  not  be  satisfied  exactly.  We 
believe  that  in  most  cases  the  planarity  assumption  is  stronger  than  the  orthogonality 
assumption.  Therefore,  the  following  process  tries  to  maximize  the  orthogonality  while 
keeping  the  constraints  ISC  and  CSBC  satisfied  exactly. 

Given  a  contour  having  parallel  symmetry  and  a  straight  mirror  symmetry,  construct 
the  surface  orientation  using  constraints  ISC  and  CSBC.  With  these  constraints  we  can 
construct  the  surface  orientation  (p,-,q;)  at  every  point  given  the  three  parameters  (pc,  qc) 
and  the  angle  parameter  d ,  (see  figure  9).  Then  choose  the  values  (pc,  qc)  and  d  that 
minimizes  the  orthogonality  error  : 

n 

S  cos  &i  (8) 

i=l 

Where  6{  is  the  angle  between  the  two  3-D  vectors  ( A  and  B  in  figure  11  whose  projection 
on  the  image  plane  make  angles  a*  and  ,  with  the  horizontal,  cos#  is  given  by 

(cos(q,  -  #)  +  ( pi  cos  Qi  +  ft  sin  aj){pi  cos/?,-  +  sin #))2 
(1  +  (p;  cos  a,  +  qi  sina<)2)(l  +  (p{  cos#  +  g,  sin#)2) 

Here  ( )  are  dependent  on  ( pc,<Jc )  and  d  as  given  by  constraints  ISC  and  CSBC.  We 
want  to  maximize  the  orthogonality  by  minimizing  the  above  function  E  for  (pc,qc)  and 
d.  We  can  convert  this  problem  into  a  2-D  minimization  problem  by  associating  a  d  value 
to  each  choice  of  (pc,qc)  that  minimizes  E. 

Unfortunately,  for  a  general  conic  surface  the  global  minimum  for  S  occurs  when 
(Pc,qc)  =  (0,0)  and  d  =  oo;  this  is  an  infeasible  interpretation.  However,  function  E,  in 
terms  of  (pc,<7c)  has  a  “valley”  of  local  minima  (passing  through  the  origin  of  the  p  -  q 
space)  and  the  valley  is  typically  a  straight  line.  Any  choice  of  (pc,qc)  along  this  valley 
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Figure  12:  Two  circular  cylinders,  (a)  is  without  the  complete  cross  section,  (b)  is  with  the 
complete  cross  section 

is  essentially  equally  acceptable,  i.e.we  have  one  degree  of  freedom  to  fix.  In  section  3.3 
we  discuss  how  to  choose  a  specific  value  of  ( pc,qc )  on  that  line  using  the  shape  of  the 
cross  section. 

3.3  Estimating  (pc,gc) 

As  discussed  in  section  3.2  the  previous  three  constraints  (ISC,  CSBC,  OC)  leave  one 
degree  of  freedom,  namely  constraining  the  orientation  of  the  cross  section  plane,  (pc,qc), 
to  be  along  the  minimum  line  of  the  function  E.  However,  computationally  it  is  very 
expensive  to  compute  the  this  line.  Therefore  we  use  the  following  gradient  descent 
algorithm  to  compute  (pc,<fc). 

1.  Choosing  a  starting  line,  Z0 ,  passing  through  the  origin,  in  the  p  —  q  space,  in  the 
direction  of  the  mirror  symmetry  axis.  Set  the  current  line  Z  =  Z0. 

2.  Compute  the  (pc,qe)  for  the  line  l  using  the  method  described  below. 

3.  Compute  the  value  of  E  for  (pc,?c)j  check  if  (pc,gc)  is  along  the  minimum  line  of  E 
by  repeating  the  above  process  for  lines  ±55  degrees  off  the  line  Z,  and  by  comparing 
the  S  values  for  these  lines. 

4.  If  (pej<Zc)  is  along  the  minimum  line  of  S  stop.  Otherwise  choose  another  line  by 
rotating  the  line  Z  60  degrees  in  the  direction  of  descending  E.  And  go  to  step  2. 

Computing  ( pe,qc )  given  a  line  Z:  We  rotate  the  coordinate  system  such  that  the 
line  Z  is  aligned  with  the  q  axis  of  the  p  —  q  plane  then  we  have  pe  =  0  and  qc  is  the 
unknown  quantity. 

To  fix  qc,  we  need  to  use  the  shape  of  the  cross-section.  We  propose  a  method  for  doing 
so  that  is  based  on  perceptual  properties  rather  than  on  mathematical  constraints.  We 
observe  that  humans  prefer  compact  shapes  but  tend  to  avoid  very  high  or  very  slow 
slant  angles.  Compactness,  defined  as  (area) / (perimeter)2  has  been  used  previously  as 
a  compactness  measure  in  [BY84]  and  [HB88].  However  these  methods  require  closed 
boundaries  which  are  not  always  available,  for  example,  see  figure  12  (a). 

Our  basic  method  consists  of  fitting  an  ellipse  to  the  observed  contour  and  computing 
the  orientation  that  would  backproject  it  into  the  ellipse  of  least  eccentricity,  consistent 
with  the  other  constraints.  A  correction  that  biases  the  answer  towards  45°  is  applied. 


Figure  13:  (a)  A  cylindrical  object  and  the  ellipse  fitted  to  the  cross  section,  (b)  the  orientation 
(Pei  Qc)  that  would  make  the  ellipse  a  circle  and  its  projection  on  the  q  axis  gives  qe,  first 
approximation  to  qc. 

First  Estimation  of  qc:  An  ellipse  fitting  process  is  utilized  as  a  first  approximation 
for  qc.  An  ellipse  is  fit  to  the  cross  section  contour,  then  the  orientation  of  the  circle 
that  would  project  as  the  fitted  ellipse  is  projected  on  the  q  axis,  on  the  p  —  q 
plane  to  obtain  the  first  approximation  of  qc ,  call  it  qe.  Figure  13  shows  an  example. 

It  may  also  be  necessary  to  segment  the  cross  section  if,  it  is  complex  and  repetitive. 
To  achieve  this,  the  concavities  of  the  contour  are  found  and  matched.  If  they  match  in 
such  a  way  that  the  cross  section  is  segmented  into  similar  pieces,  then  a  different  ellipse 
is  fit  to  each  piece  of  the  contour  and  average  of  the  ellipses  is  used  to  estimate  qe.  Figure 
14  shows  various  objects  and  ellipses  fit  for  their  cross  sections. 

Updating  qe :  The  purpose  of  this  updating  process  is  to  simulate  the  bias  that  humans 
have  in  orienting  the  cross  section  toward  45°.  We  update  qe  to  obtain  the  final  qc  as 
follows  (after  converting  qc  into  degrees): 

qc  =  45°  +  X(qc  -  45°)  (9) 

Where  A  is  a  confidence  factor  in  the  range  [0, 1]  and  is  a  function  of  how  well  the 
ellipse  approximates  the  cross  section  curve.  Intuition  suggests  that  the  better  the  ap¬ 
proximation  of  the  ellipse  the  higher  the  value  of  A  should  be  and  the  closer  the  qc  is  to 
the  45°  the  less  the  correction  should  be.  The  A  we  are  using  : 


A(c)  =  (1  —  £2)  (10) 

Where  £  is  the  ellipse  fit  error  (in  range  [0,1]).  We  believe  that  the  exact  form  of  the 
function  is  not  critical.  Small  changes  in  qe  do  not  radically  affect  the  perceived  surface 
shape  and  humans  too  estimate  qc  very  imprecisely. 

Validation  :  As  the  method  described  here  is  perceptual  rather  than  mathematical, 
we  performed  a  study  on  human  estimation  of  the  orientation  of  the  cross-sections  on  a 
number  of  subjects  and  a  number  of  test  objects.  Space  does  not  allow  us  to  describe 
the  study  in  detail.  In  brief,  we  found  that  human  perception  is  rather  imprecise  in  their 
estimate  of  the  desired  orientation  (even  when  measured  relatively),  with  an  average 
standard  deviation  of  8°.  We  found  that  the  results  of  our  algorithm  match  the  human 
estimates  well  given  this  large  variance  (the  average  deviation  from  human  average  was 
6°). 
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Figure  14:  Objects  and  ellipses  fit  for  their  cross  sections.  For  some  of  the  objects,  above, 
the  fitted  ellipses  are  not  totally  visible  due  to  their  closeness  to  the  actual  contour. 

3.4  Computational  Results 

In  figure  15  we  show  the  computed  surface  orientations  for  various  object  surfaces.  The 
symmetries  of  the  contour  curves  are  assumed  to  be  given.  For  each  object  qc  is  computed 
using  the  method  describe  in  section  3.3.  Then  the  angle  parameter  d  is  computed  by 
minimizing  the  orthogonality  error  3  given  in  equation  8.  The  surface  orientation,  (pi,<fc) 
at  each  point  then  is  computed  by  using  constraints  ISC  and  CSBC  as  illustrated  in  figure 
9.  For  figure  15,  on  the  right  we  show  the  2-D  contour  of  the  object,  in  the  middle  we 
show  needle  images  of  the  objects  computed  and  on  the  right  the  objects  are  shaded  with 
the  orientation  computed  for  each  point  on  the  surface. 

It  is  worth  noting  that  for  all  the  objects  the  computed  orientation  at  the  limb  bound¬ 
aries  of  the  objects  is  orthogonal  to  the  boundary,  even  though  this  was  not  an  explicit 
constraint  in  our  method. 

The  cross  section  of  the  object  in  the  last  row  is  segmented  into  two  planar  sections 
based  on  the  observation  of  the  mirror  symmetry  of  the  cross  section.  Each  section  is 
processed  individually  but  the  inner  surface  constraint  is  required  to  apply  between  the 
two  sections  of  the  object. 

4  Conclusion 

We  have  presented  a  technique  for  inferring  shape  from  contour  for  curved  surfaces.  This 
method  has  been  studied  in  depth  for  zero- Gaussian  curvature  surfaces  but  we  believe 
that  it  extends  to  double  curved  surfaces  as  well;  we  hope  to  present  some  results  on  such 
surfaces  soon.  Our  technique  requires  some  assumptions  about  the  observed  contours  but 
these  assumptions  are  minimal,  and  reasonable  in  our  view.  We  have  also  made  use  of 
observed  human  preferences  in  resolving  one  degree  of  freedom. 

Our  method  has  been  implemented  and  tested,  but  it  assumes  that  the  symmetry 
properties  are  given.  For  real  images,  the  symmetries  are  unlikely  to  be  precise  and 
several  alternatives  may  be  available. 

The  method  presented  only  exploits  the  interaction  between  a  curved  surface  and  a 


planar  surface.  For  more  complicated  objects,  interactions  between  two  (or  more)  curved 
surfaces  exist.  We  plan  to  study  such  objects  in  further  explorations  of  the  described 
approach. 

Appendix 
A  Proofs 

In  this  section  we  give  two  proofs;  one  is  related  to  the  existence  of  parallel  symmetries 
on  Zero  Gaussian  Curvature  surfaces  (theorem  3)  and  the  other  proves  the  Inner  Surface 
Constraint.  All  of  the  proofs  uses  the  following  surface  representation. 

Let  =  (x(u,v),y(u,v),  2(11,  v))  be  a  {u,v)  parametric  representation  of  the  class 

C2  Zero  Gaussian  Curvature  surface  X.  Let’s  assume  that  the  v  parameter  curves  are 
along  the  lines  of  minimum  curvature  (rulings)  of  the  surface. 

Normal,  A/’,  of  this  surface  at  any  point  is  given  by: 


Al  = 


Xu  x  Xv 
\XU  x  Xv\ 


(11) 


where  x  is  the  vector  product  operator,  and  jVJ  is  the  length  of  the  vector  V.  Note  here 
that  \Af\  =  1.  First,  /,  and  second,  II,  fundamental  forms  of  such  a  surface  are  given  by: 


I(Xudu  +  Xvdv)  =  Edu2  +  2 Fdudv  -f  Gdv2 
II(Xudu  +  Xvdv)  =  Ldu2  +2Mdudv  +  Ndv2 

where 

E  =  Xu  •  Xu  F  =  XU-XV  G  =  XV  XV 
L  =  Xuu-Af  M  =  Xuv  •  Af  N  =  Xvv-Af 

Since  the  parameter  v  is  along  the  ruling  (a  line)  the  normal  curvature  of  the  surface 
in  the  direction  Xv,  given  by  II{XV),  should  be  zero,  then  we  have: 

II{XV)  =  N  =  0  (14) 

Gaussian  curvature,  k,  of  such  a  surface  is  given  by  [Lip69] 


(12) 

(13) 


LN  -  M2 
EG  -  F 2 


(15) 


Since  the  Gaussian  curvature  of  the  surface  is  zero  setting  «  =  0,  with  substituting  0 
for  N  by  equation  14  gives;  M  =  0. 

A.l  Proof  of  Theorem  3 

Consider  the  surface  X  given  in  section  A.  Also  assume  that  the  u  parameter  curves  on 
the  surface  X  are  planar  and  parallel  to  each  other.  We  have  to  show  that  the  tangent 
of  the  u  parameter  curves,  j^j,  is  constant  with  respect  to  v  (i.e.^j  is  a  function  of  u 
only). 
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Let  the  planes,  the  u  parameter  curves  are  resting  on,  have  the  normal  V  (V  is  con¬ 
stant).  Then  we  have: 

•  V  =  0  =>  0  =  =Xuv-V  +  Xu-Vv  =  Xuv-V  (16) 

av 

That  is  XU±V  and  Xuvl.V  for  all  u  and  v.  Also  XuLAf  by  equation  11  and  XuvLAf 
since  M  =  0,  therefore,  unless  Af  / /V,  we  have 

Xuv  =  ctAf  x  V  and  Xu  =  c2Af  xV  (17) 


for  some  constants  Ci  and  c2.  That  is,  Xu//Xuv,  and  the  derivative  of  with  respect 
to  v  is: 

3 ,  x. .  _  .... 

l*J2 

Since  Xu/ /Xuv  we  can  substitute  Xuv  by  in  the  above  equation: 


9  X  \XUV\  (Xu-Xu)|Xu„i 

5nliXur  “  l^ul2  \XU\A 


(19) 


Therefore  the  tangent  of  the  u  parameter  curves  are  parallel  to  each  other  at  the  points 
they  meet  a  particular  ruling,  resulting  in  u  parameter  curves  projecting  as  parallel 
symmetric  with  the  lines  of  symmetry  corresponding  to  the  rulings.  □ 


A.2  Proof  of  the  Inner  Surface  Constraint 


Here  we  will  prove  the  inner  surface  constraint  asserted  by  the  equation  4. 

Consider  a  surface  as  given  in  appendix  A,  a  ZGC  surface  with  v  parameter  curves  are 
along  the  rulings  and  u  parameter  curves  are  arbitrary.  Here  we  have  Xv  •  Afu  =  0  since 

0  =  =  Xuv  .tf  +  Xv -Afu  =  M+.XV-K  =  Xv  -J/u  (20) 

ou 

We  can  write  M  in  terms  of  the  gradient  (p,  q)  as:  Af  =  c(p,q,  1). 

where  c  is  the  scale  coefficient  and  equal  to  (p2  -f  q7  -f  l)-1^2.  Differentiation  of  Af  with 
respect  to  the  parameter  u  gives: 

Afu  =  cu(p,q,l) +  c(pu,gu,0)  (21) 

=  —Af  +  c(pu,qu,  0) 
c 

If  we  set  Xv  ■  Afu  =  0  from  20,  where  Xv  ~  ( xv,yv,zv )  and  Afu  is  given  in  equation  21 
we  get: 

Xv  •  Afu  =  —Xv  •  Af  +  c(x„,  yv,  zv)  ■  (pu,  qu,  0)  =  0  (22) 

c 

We  also  have  Af  ■  X „  =  0  from  11.  Therefore 


□ 


ZvPu  +  Vvqu  =  0 

<7u  _ 

Pu  Vv 


(23) 


The  same  constraint  can  be  shown  to  be  valid  for  non-ZGC  surfaces  if  the  u  parameter 
curves  are  chosen  to  be  along  the  direction  of  maximum  curvature. 
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Abstract 

We  analyze  the  properties  of  Straight  Homo¬ 
geneous  Generalized  Cones  (SHGCs)  and  Con¬ 
stant  Generalized  Cylinders  (CGCs),  and  de¬ 
rive  the  types  of  symmetries  that  the  limb 
boundaries  and  cross  sections  of  these  objects 
produce  on  the  image  plane.  The  constraints 
on  the  3-D  shape  of  the  objects  are  formulated 
based  on  the  symmetries  and  from  the  geome¬ 
try  of  the  projection  models.  Finally  the  meth¬ 
ods  that  recover  the  3-D  shape  from  the  image 
of  their  contours  are  discussed  and  recovered 
surfaces  are  shown  for  sample  objects. 

1  Introduction 

This  paper  is  about  inferring  3-D  shape  from  2-D  con¬ 
tours  for  a  class  of  objects,  namely  generalized  cones 
of  constant  cross-section  (but  possibly  having  complex 
shaped  axes)  which  we  call  CGCs  (or  snakes )  and  for 
straight  homogeneous  generalized  cones  or  SHGCs.  This 
class  of  generalized  cones  covers  a  broad  class  of  objects 
of  interest,  it  includes  the  so-called  linear  straight  ho¬ 
mogeneous  generalized  cones,  solids  of  revolution,  and 
pipes  of  arbitrary  shape.  Some  examples  are  shown 
in  figures  1  and  2  The  method  we  describe  is  based 
on,  and  is  a  major  generalization  of,  the  technique  we 
developed  for  inferring  shape  of  zero-Gaussian  curva¬ 
ture  (or  ZGC)  surfaces  [Ulupinar  and  Nevatia,  1988, 
Ulupinar  and  Nevatia,  1990]. 

Inferring  shape  of  the  surfaces  in  a  scene  from  a  sin¬ 
gle  line  drawing  is  an  important  and  difficult  problem  in 
computer  vision.  Early  work  concentrated  on  analysis  of 
line  drawings  of  polyhedra  (Huffman,  1971,  Clowes,  1971, 
Mack  worth,  1973,  Kanade,  1981,  Sugihara,  1986].  There 
have  been  other  efforts  at  developing  techniques  for 
curved  surfaces  such  as  [Barrow  and  Tenenbaum,  1981, 
Stevens,  1981,  Xu  and  Tsuji,  1987,  Horaud  and  Brady, 
1988].  We  believe  that  the  techniques  presented  here 
extend  the  complexity  of  surfaces  that  can  be  analyzed 
significantly. 

’This  research  was  supported  by  the  Defense  Advanced 
Researcli  Projects  Agency  under  contract  number  F  33615- 
87-C-1436  monitored  by  the  Air  Force  Wright  Aeronautical 
Laboratories,  Darpa  Order  No.  3119. 


Figure  1:  Sample  SHGCs. 


Our  approach  is  based  on  an  analysis  of  the  symme¬ 
tries  in  a  scene.  In  section  2  we  define  the  symmetries  we 
use.  Then  we  show  how  such  symmetries  arise  naturally 
in  images  of  the  class  of  objects  we  study.  In  section 
3  we  summarize  the  constraints  that  derive  from  these 
symmetries  and  other  properties  of  boundaries  for  deter¬ 
mining  the  3-D  shape.  In  section  4  we  give  a  summary  of 
previous  work  on  ZGC  surfaces.  In  sections  5  and  6  we 
show  how  these  constraints,  and  other  properties  of  the 
boundary  allow  us  to  infer  3-D  shape  of  the  objects  in 
the  scene.  Some  computational  results  are  also  shown. 

In  the  subsequent  analysis,  we  will  assume  that  the 
image  is  obtained  by  an  orthographic  projection  (though 
some  of  our  theorems  apply  to  perspective  projection  as 
well)  and  from  a  general  viewpoint. 

Definition  1  General  Viewpoint  :  A  scene  is  said  to 
be  imaged  from  a  general  viewpoint,  if  perceptual  prop¬ 
erties  of  the  image  are  preserved  under  slight  variations 
of  the  viewing  direction. 

Specifically,  the  properties  we  are  interested  in  are: 
straightness  and  parallelity  of  lines  and  symmetry  of 
curves  (symmetries  as  defined  in  the  following). 

2  Symmetry  Definitions  and 
Qualitative  Shape  Inference 

We  believe  that  symmetries  have  an  important  role 
in  shape  perception,  this  also  has  been  noted  and 
used  by  many  researchers  [Nevatia  and  Binford,  1977, 
Nalwa,  1987,  Rao,  1988,  Kanade,  1981,  Stevens,  1 98 1] . 
We  first  define  two  types  of  symmetries  and  then  show 
the  conditions  under  which  they  may  be  observed  in  an 
image  of  CGC  or  SHGC  objects. 

2.1  Symmetry  Definitions 

We  define  two  types  of  symmetries,  that  we  call  paral¬ 
lel  symmetry  and  mirror  symmetry.  For  curves  to  be 
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symmetric  (parallel  or  mirror)  certain  point-wise  corre¬ 
spondences  between  two  curves  must  exist.  We  will  call 
the  lines  joining  the  corresponding  points  on  the  curves 
as  the  lines  of  symmetry,  the  locus  of  the  mid  points 
of  these  lines  as  the  axis  of  symmetry,  and  the  curves 
forming  the  symmetry  as  the  curves  of  symmetry. 

Parallel  Symmetry  Let  -X<(s)  =  (au(*)iy»(s),2«(s)), 
for  t  =  1,2,  be  two  curves  in  3-D  parameterized  by  arc 
length  s. 

The  curves  -X’i(s)  and  .Y2(.»)  are  said  to  be  parallel  sym¬ 
metric  if  there  exists  a  point-wise  correspondence  f(s) 
between  them  such  that,  .Y{(j)  =  -Jf2(/(*))  for  all  values 
of  s  for  which  Xx  and  JY2  are  defined  and  f  (s)  is  a  contin¬ 
uous  monotonic  function.  Note  that  projection  of  curves 
-Yj  and  X2  under  orthographic  projection  produces  im¬ 
age  curves  that  are  parallel  symmetric  such  that  the  3-D 
point  correspondence  is  preserved.  Computing  symme¬ 
try  between  two  curves  using  this  definition  requires  es¬ 
timating  the  function  f(s)  as  well.  A  useful  special  case 
is  when  /(s)  is  restricted  to  be  a  linear  function. 

Mirror  Symmetry  For  mirror  symmetry,  the  point- 
wise  correspondence  should  be  such  that  the  axis  of  the 
symmetry  is  straight,  and  the  lines  of  symmetry  are  at 
a  constant  angle  (not  necessarily  orthogonal)  to  the  axis 
of  symmetry.  This  definition  of  the  mirror  symmetry  is 
similar  to  that  of  skew  symmetry.  We  use  the  term  mir¬ 
ror  symmetry  in  the  context  of  curved  surfaces  as  skew 
symmetry  has  historically  been  used  for  planar  surfaces 
only. 

We  believe  that  the  symmetries  we  have  defined,  either 
separately  or  taken  together,  give  some  qualitative  as 
well  as  quantitative  information  about  the  surface  shape. 
In  [Ulupinar  and  Nevatia,  1990]  we  showed  that  a  figure 
bounded  entirely  by  one  mirror  symmetry  must  be  pla¬ 
nar  and  that  a  figure  bounded  by  one  parallel  symmetry 
and  one  mirror  symmetry  with  straight  lines  of  symme¬ 
try  must  be  a  ZGC  surface  (assuming  general  viewpoint 
in  both  cases).  In  the  following  we  show  the  proper¬ 
ties  that  allow  us  to  infer  the  presence  of  PRCGCs  and 
SHGCs. 

First,  we  discuss  some  useful  geometric  properties  of 
differentiable  surfaces. 

2.2  Surfaces  and  Their  Limb  Edges 

Definition  2  Tangent  line,  Lv,  of  a  surface,  S,  at 
point,  P ,  in  a  given  direction,  V ,  is  the  line  from  the 
point  P  in  the  direction  of  the  tangent  of  the  curve,  C , 
obtained  by  cutting  the  surface  by  a  plane,  II,  that  passes 
through  P,  and  contains  the  normal,  N,  of  the  surface 
at  P  and  the  direction  given  by  the  vector  V. 


Figure  3:  Tangent  line,  Lv,  of  a  surface  S  at  point  P  in 


Figure  4:  Tangent  plane,  Tp,  of  a  surface,  S,  containing 
all  the  tangent  lies  at  point  P 

Figure  3  shows  an  example. 

It  is  a  well  known  property  in  differential  geometry 
(Do  Carmo,  1976]  that  the  tangent  lines,  Lvi,  of  a  sur¬ 
face,  S,  at  point,  P,  in  all  possible  directions,  Vi  £  R3, 
are  on  a  plane,  Tp,  called  the  tangent  plane  of  the  sur¬ 
face  at  P.  Moreover  the  plane  Tp  is  orthogonal  to  the 
normal,  N,  of  the  surface  at  P.  This  property  is  shown 
graphically  in  figure  4. 

Next,  we  define  limb  edges  and  their  projections  for 
smooth  surfaces. 

Definition  3  The  limb  edge  of  a  surface  is  a  viewpoint 
dependent  curve  on  the  surface  such  that  at  each  point  on 
the  curve  the  surface  normal  is  orthogonal  to  the  viewing 
direction. 

The  limb  edges  project  on  the  image  plane  as  the 
bounding  curve  of  the  surface.  At  these  edges  the  sur¬ 
face  smoothly  curves  around  to  occlude  itself.  This  def¬ 
inition  of  limb  edges  holds  both  for  orthographic  and 
perspective  projection.  Limb  edges  (also  called  “occlud¬ 
ing  contours”)  can  give  some  very  important  informa¬ 
tion  about  the  3-D  surface  they  come  from;  Koenderink 
[Koenderink,  1984]  has  given  a  nice  analysis  in  previous 
work.  We  will  show  how  the  limb  edges  help  us  recover 
3-D  surface  shape  later  in  this  paper. 

Theorem  1  All  the  tangent  lines  of  a  surface  at  a  point, 
P,  which  is  on  a  limb  edge  of  the  surface  for  a  given  pro¬ 
jection  geometry,  project  as  the  same  line  on  the  image 
plane. 

Proof  The  proof  involves  a  simple  combination  of  the 
definition  of  limb  edges  and  the  property  of  tangent 
planes.  Since  the  normal  of  the  tangent  plane  at  P 
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Figure  5:  An  SHGC  along  the  z  coordinate  axis  with 
both  meridians  and  cross  sections  marked. 

(which  is  also  the  normal  of  the  surface  at  P)  is  orthog¬ 
onal  to  the  viewing  direction,  the  tangent  plane  projects 
as  a  line  on  the  image  plane.  Therefore  all  the  tangent 
lines  at  P,  which  are  included  in  the  tangent  plane  also 
project  to  the  one  line  that  the  plane  projects  into.  □ 

This  theorem,  though  simple  and  rather  obvious,  turns 
out  to  be  highly  useful  in  proving  other  important  prop¬ 
erties  of  limb  boundaries. 

2.3  SHGCs 

Straight  homogeneous  generalized  cones  (SHGCs)  are 
obtained  by  sliding  a  cross  section,  say  C,  along  a 
straight  axis,  say  A.  The  cross  section  is  also  scaled 
as  it  is  swept  along  the  axis  by  a  scaling  function,  say  r. 
We  can  parameterize  the  surface,  5,  of  an  SHGC,  given 
the  planar  cross  section  C(u)  =  (x(u),  y(u),  0),  and  the 
scaling  function  r(t),  as  : 

S(u,  t)  ~  (r(t)x(u),  r(t)y(u ),  t)  (1) 

The  axis  of  the  SHGC  in  this  case  is  the  z  axis  of 
the  coordinate  system.  An  example  is  shown  in  figure 
5.  Note  that  the  cross  section  curves  are  generated  by 
fixing  t  and  varying  it.  We  will  call  the  curves  gener¬ 
ated  by  fixing  it  and  varying  t  as  the  meridians  of  the 
surface.  Note  that  cross  section  of  an  SHGC  are  planar 
because  the  cross  section  function  C(u)  is  planar,  and  the 
meridians  of  an  SHGC  are  planar  since  the  SHGC  has 
no  twist  in  its  sweep.  Let  meridian  edges  of  an  SHGC  be 
edges  that  are  along  the  meridians  of  the  SHGC.  Usually 
images  of  SHGCs  do  not  contain  meridian  edges,  how¬ 
ever,  such  edges  may  be  present  if  the  cross  section  has 
a  tangent  discontinuity  (a  corner).  Figure  1  shows  some 
sample  SHGCs. 

Theorem  2  For  an  SHGC,  the  tangent  lines  of  the  sur¬ 
face  in  the  direction  of  the  axis  from  the  points  of  any 
given  cross  section  intersect  at  a  common  point  on  the 
axis  of  the  SHGC. 

A  proof  of  this  theorem  may  be  found  in  (Shafer  and 
Kanade,  1983],  Figure  6  (a)  graphically  illustrates  the 
property. 

Corollary  The  tangents  of  all  meridian  edges  at  the 
points  they  intersect  a  single  cross  section  intersect  the 
axis  of  the  SHGC  at  a  single  point.  Therefore  in  the  in 
the  image  plane,  too,  the  tangents  of  the  images  of  the 
meridian  edges,  at  the  point  they  intersect  a  single  cross 


Figure  6:  (a)  An  SHGC,  and  its  tangent  lines,  in  the 
direction  of  the  axis  emitting  from  a  single  cross  section, 
intersecting  at  a  single  point  on  the  axis,  (b)  The  tangent 
lines,  T),  of  limb  edges  are  not  the  same  as  the  tangents 
lines,  Tm,  of  the  meridians  in  3-D. 

section,  intersect  the  image  of  the  axis  in  a  single  point, 
under  orthographic  or  perspective  projection. 

It  has  been  shown  by  Shafer[Shafer,  1983]  that  the 
limb  edges  on  an  SHGC  are  not  planar.  Therefore  the 
limb  edges  of  an  SHGC  are  necessarily  not  along  its 
meridians,  and  the  tangents  of  the  limb  boundaries  at 
the  point  they  intersect  the  same  cross  section  do  not 
intersect  the  axis  in  3-D.  (Figure  6  (b)  shows  the  limb 
edge  and  its  tangent  for  an  SHGC  after  rotating  it,  to 
show  that  in  3-D  the  tangent  of  the  limb  edge  does  not 
intersect  the  axis  of  the  SHGC.)  Still,  it  has  been  shown 
by  Ponce  [Ponce  et  al.,  1989]  that  under  orthographic 
projection  the  tangents  of  the  limb  edges,  at  the  point 
they  intersect  the  same  cross  section,  intersect  the  image 
of  the  axis  at  a  single  point.  Here  we  give  a  simpler  proof 
which  is  independent  of  the  projection  geometry. 

Theorem  3  The  tangents  of  the  projections  of  the  limb 
edges  at  the  points  they  intersect  the  same  cross  sec¬ 
tion,  when  extended,  intersect  the  image  of  the  axis  of 
the  SHGC  at  the  same  point. 

Proof  Say  the  limb  edge  intersects  a  given  cross  section 
at  point  P  (see  figure  6).  Since  the  tangent  line  Tm  from 
point  P  in  the  direction  of  the  axis  of  the  SHGC  (the 
tangent  line  of  the  meridian  passing  through  the  point 
P)  intersect  the  axis  of  the  SHGC,  by  theorem  1,  the 
image  of  the  tangent  line  T;  from  point  P  in  the  direction 
of  the  tangent  of  the  limb  edge  project  as  the  same  line 
as  the  tangent  line  Tm  and  thus  image  of  the  line  T/ 
intersect  the  image  of  the  axis  at  the  same  point  as  the 
image  of  the  line  Tm  intersects.  □ 

Since  theorem  1  holds  both  under  perspective  and  or¬ 
thographic  projection,  the  above  theorem  and  the  proof 
hold  for  both  of  the  projection  geometries. 

In  the  following  we  show  that  the  cross  sections  of  an 
SHGC  are  parallel  symmetric  in  3-D  with  the  meridian 
curves  joining  the  parallel  symmetric  points  of  the  cross 
sections. 
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Figure  7:  Image  of  an  SHGC  cut  along  its  cross  sections. 
Image  of  the  top  cross  section  curve  is  Ct,  the  bottom 
one  is  Ct  and  the  limb  boundaries  are  on  the  left  Cj  and 
on  the  right  C, . 


Theorem  4  The  cross  sections  of  an  SHGC  are  parallel 
symmetric  in  3-D  with  each  other  such  that  the  merid¬ 
ian  curves  join  the  parallel  symmetric  points  of  the  cross 
sections. 

Proof  We  have  to  show  that  the  direction  of  the  tan¬ 
gent  of  the  cross  sections  is  independent  of  the  t  param¬ 
eter  curve.  Using  the  parameterization  for  an  SHGC 
given  in  equation  1  the  tangent  of  the  cross  sections  (u 
parameter  curves)  is  given  by: 

Su  =  (>•(*)*'(«),  r(t)y'(u),  0)  =  r(t)(*'(u),  y'(u),  0)  (2) 

Clearly  the  direction  of  Su  is  independent  of  the  t  pa¬ 
rameter.  □ 

Corollary  The  projection  of  the  cross  section  curves  of 
an  SHGC  are  also  parallel  symmetric  in  the  image  plane. 
And  the  correspondence  function  is  linear  because  cross 
sections  are  obtained  by  scaling  a  reference  cross  section 
curve  without  deforming  it. 

2.3.1  Recovering  the  Cross  sections 

We  next  show  how  to  find  the  projections  of  cross  sec¬ 
tions  in  the  image  of  an  SHGC,  given  the  images  of  its 
external  contours.  Our  method  does  not  require  com¬ 
plete  cross  sections,  but  only  the  part  that  lies  on  the 
visible  face  of  the  SHGC.  However,  we  require  that  the 
SHGC  be  cut  along  its  cross  sections,  otherwise  we  would 
not  have  a  parallel  symmetry  between  the  image  curves 
of  the  two  extreme  cross-sections  ( Ct  and  Cb  in  figure  7) 
We  conjecture  that  humans  too  do  not  do  well  if  this  con¬ 
dition  is  not  satisfied.  The  following  algorithm  recovers 
the  image  curves  Ci  that  correspond  to  the  projections 
of  the  cross  sections  of  the  SHGC. 

For  each  point  Pi  £  Cj  do: 

1.  Find  the  point  Pcl  £  Ct  such  that  CllPA  = 

c[(PtiV 


‘The  =  operator  is  used  for  parallelity  of  vectors,  that  is, 
if  Vt  =  Vj  then  Vi  =  AVj  for  some  scalar  A. 


Figure  8:  Images  of  the  cross  sections  and  axes  recovered 
for  the  SHGCs  in  figure  1 

2.  Translate  the  cross  section  curve  Ct  such  that  the 
point  Pci  £  Ct  coincides  with  the  point  fl,  obtaining 
the  curve  Ctt- 

3.  Find  the  point  PCT  €  Ctt  that  minimizes  the  func¬ 
tion  f(Pcr)  =  (tfi  +  dj)/di  which  is  the  amount  of 
scaling  required  to  be  applied  on  the  curve  Ctt  to 
bring  the  point  Per  to  the  point  PT.  The  quantities 
d\  and  dj  are  the  length  of  the  line  segments  from 
Pi  to  P„  and  from  Pcr  to  PT.  It  can  be  shown  that 
local  minima  of  the  function  /(•)  above  gives  the 
correct  point  PCT  £  Ctt  such  that  the  limb  bound¬ 
ary  condition  C'tt(PCr)  =  C'T(Pr)  is  met. 

4.  Scale  the  curve  Ctt  by  f[PCr)  so  that  the  point  PCT 
meets  with  the  point  PT,  obtaining  the  curve  Cj. 

The  curve  Cj  obtained  by  this  algorithm  is  precisely 
the  image  of  the  cross  section  curve  between  the  points 
Pi  and  PT  of  the  SHGC.  Once  the  correspondence  of 
the  points  Pi  and  PT  between  the  limb  edges  C|  and 
Cr  is  obtained,  we  can  recover  the  image  of  the  axis 
of  the  SHGC  by  using  theorem  3.  Figure  8  shows  the 
computed  images  of  the  cross  section  curves  and  the  axes 
for  SHGCs  in  figure  1.  If  the  parallel  symmetric  points 
of  the  cross  section  curves  are  joined,  by  theorem  4,  we 
obtain  the  meridian  curves. 

2.3.2  Observing  SHGCs 

If  there  are  t  wo  parallel  symmetric  curves  with  a  linear 
correspondence  function  such  that  they  are  bound  by 
curves  that  has  a  straight  axis  when  the  axis  is  computed 
by  the  above  algorithm,  then  we  can  hypothesize  that  the 
line  drawing  results  from  an  SHGC. 

2.4  CGCs  (Snakes) 

Snakes  are  generalized  cones  that  have  a  constant  cross 
section  but  the  axis  may  be  an  arbitrary  3-D  curve.  Fol¬ 
lowing  Shafer’s  terminology  [Shafer  et  al. ,  1983],  such 
objects  may  be  called  CGCs.  We  will  focus  on  CGCs 
that  have  planar  axis  and  that  are  “right”,  ie  the  cross 
sections  are  orthogonal  to  the  axis;  we  call  such  objects 
PRCGCs.  Figure  2  shows  some  examples. 

In  the  following,  we  show  that  limb  boundaries  of  a 
PRCGC  project  as  parallel  symmetric  curves  under  or¬ 
thographic  projection. 

Let  us  choose  a  coordinate  system  such  that  the  axis 
of  the  PRCGC  lies  in  the  x  —  z  plane  and  one  of  the 
cross-sections,  say  C(u)  =  (cx(u),  Cy(u),  0),  is  aligned 
with  the  x  -  y  plane.  Let  A(t)  =  (ax(<), 0, a, (t))  be  the 
axis  parameterized  in  terms  of  its  arc  length,  that  is, 
|A|  =  o’  +  al  =  1  for  all  t.  Also,  let  A(0)  =  (0,0,0)  and 
since  the  cross  section  is  orthogonal  to  the  axis  A'(0)  — 
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C(u) 


Figure  9:  A  PRCGC  with  both  meridians  and  cross  sec¬ 
tions  marked. 


(0,0, 1).  Then  the  surface  of  the  PRCGC,  S(u,t)  is  given 
by: 

S(u,  t)  =  R(A'(0),  A'(t))  ■  C(u)  +  A(t)  (3) 

where  J?(Vj,V2)  is  the  rotation  matrix  that  transforms 
the  direction  vector  V)  into  vector  Vj.  For  A'(0)  = 
(0,0, 1)  and  A'(t)  =  (a'x(t),  0,a^(<))  the  rotation  matrix 
R  becomes: 


«■!(*)  0  a'x(t)  ' 

0  1  0 
-a'x(t)  0  a'(t)  . 


(4) 


Note  that  the  curves  generated  by  fixing  t  and  varying 
u  are  the  cross  sections  of  the  surface  S(u,  t).  We  will 
call  the  curves  generated  by  fixing  tt  and  varying  t  as 
the  meridians  of  the  surface.  The  meridians  are  also  the 
loci  of  points  on  the  cross  section  as  the  cross  section  is 
swept  along  the  axis.  Figure  9  shows  an  example. 

Lemma  1  The  meridians  of  a  PRCGC  are  parallel 
symmetric  and  the  curves  joining  the  parallel  symmet¬ 
ric  points  of  the  meridians  form  the  cross  sections  of  the 
surface. 


Proof  We  need  to  show  that  the  direction  of  the  tan¬ 
gents  of  the  surface  in  the  direction  of  the  meridians,  , 
is  independent  of  the  parameter  tt. 


3S(tt,  t)  dR  . 

dt  ~  =  ~dt  C(u)  +  A  (t) 

<{i)  o  «"(0 
0  0  0 
-«"(<)  0  a"(t) 


'  Cx(u)  ' 

1 

Cy(u) 

+ 

0 

0 

<(t) 

r  <(0 

0 

+  c*(u) 

0 

<(t)  \ 

-<(t) 

=  A'(t)  +  c,(tt)(A"(t))i 


(5) 


where  (^"(t))1  is  a  vector  which  is  orthogonal  to  the 
vector  A"(t)  and  is  in  the  x  -  z  plane.  Also  note  that, 
A"(t)  ■  A'(t)  =  0  since 


0  =  d(l)  =  d(A'(t)  ■  A'(t))  =  2A'(t)  ■  A"(t)  (6) 

We  conclude  that  the  vector  (^"(t))1  is  parallel  to  the 
vector  A'(t),  since  A'(t)±A"(t),  A"(t)l(A"(t))i  and  all 


Figure  10:  A  PRCGC  (half  of  a  torus)  (a)  from  a  gen¬ 
eral  view  and  (b)  semi-transparent  top  view  with  the 
limb  edges  of  the  previous  view  and  the  meridians  pass¬ 
ing  from  the  points  Pi  and  Pi  marked  along  with  their 
tangent  lines. 


three  vectors  are  on  a  plane  (the  x  —  z  plane).  Then,  we 
can  rewrite  as: 


dS(u,  t) 
dt 


=  (1  +  cx(u) 


\A'(t)\ 


)A'(t) 


(7) 


It  is  obvious  that  while  the  length  of  the  vector  de¬ 
pends  on  the  u  parameter,  the  direction  of  it  is  indepen¬ 
dent  of  the  u  parameter.  □ 

Although  the  meridian  curves  on  a  PRCGC  are  par¬ 
allel  symmetric  it  can  be  shown  that  the  limb  edges  of 
a  PRCGC  are  not  necessarily  parallel  symmetric  in  3-D 
(see  Figure  10)  However,  the  following  theorem  proves 
that  the  projections  of  the  limb  edges  of  a  PRCGC  are 
parallel  symmetric  under  orthographic  projection. 

Theorem  5  The  limb  edges  of  a  PRCGC  project  as  par¬ 
allel  symmetric  curves  onto  the  image  plane. 


Proof  Here  we  use  the  property  given  in  theorem  1 
and  in  lemma  1.  Consider  the  points  Pi  and  Pi  in  figure 
10  such  that  both  points  are  on  the  same  cross  section. 
As  can  be  seen  in  figure  10  (b)  the  tangent  lines  lj  and  Z2 
from  points  Pi  and  P2  in  the  direction  of  the  limb  edges 
are  not  parallel  symmetric  in  3-D.  However,  the  tangent 
lines  mi  and  m2  from  points  Pi  and  Pi  in  the  direction  of 
the  meridians  are  parallel  symmetric  by  lemma  1.  Since 
the  tangent  line  !i  project  the  same  line  as  the  tangent 
line  mi,  and  tangent  line  /2  project  the  same  line  as  the 
tangent  line  m2  by  theorem  1  the  projection  of  the  limb 
boundaries  of  a  PRCGC  are  parallel  symmetric.  □ 


2.4.1  Observing  PRCGCs 

If  in  the  image  plane  there  are  parallel  symmetric 
curves  that  are  terminated  by  two  curves  (possibly  closed 
and  having  mirror  symmetry  which  enhances  planarity 
of  the  cross  section)  then  we  hypothesize  that  it  is  a 
PRCGC.  The  real  test  for  the  line  drawing  to  belong  to 
a  PRCGC  may  be  performed  after  the  cross  sections  are 
recovered  as  described  in  section  6.1. 


3  Constraints  for  Determining  Surface 
Shape 

We  now  give  three  constraints  that  derive  from  obser¬ 
vations  of  the  symmetries  and  other  boundaries  in  the 
image. 
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3.1  Curved  Shared  Boundary  Constraint 
(CSBC) 

This  constraint  relates  the  orientations  of  the  two  sur¬ 
faces  on  opposite  sides  of  an  edge.  It  is  a  generalization 
of  the  constraint  used  in  polyhedral  scene  analysis  from 
the  early  days  [Mackworth,  1973]  and  has  been  stated 
previously  in  [Shafer  et  al.,  1983,  Ulupinar  and  Nevatia, 
1990]. 

Let  two  surfaces,  Si  and  S2  intersect  along  a  curve, 
T,  whose  projection  is  the  curve  I\(a)  =  (r*(s),  ry(s)). 
Let  the  orientations  of  the  surfaces  Si  and  Sj  along  the 
curve  r(s),  in  gradient  space,  be  given  by  (pi(s), 9i(*)) 
and  (p2  ( ■» ) ,  92  (■*))•  Then  CSBC  states  that: 

rx(*)(P2(*)  ~  Pi(s))  +  r[,(s)(g2(s)  -  $i(s))  =  0  (8) 

A  stronger  constraint  can  be  obtained  if  we  can  assume 
that  the  3-D  intersection  curve,  T,  is  planar.  Say,  T  lies 
in  a  plane  with  orientation  ( pc ,  qe).  With  the  assumption 
of  planarity  the  constraint  equation  becomes: 

r'x(s)(Pc  -Pi(«))  +  ry(«)(9e  -?<(*))  =  0  .*  =  1.2  (9) 

3.2  Inner  Surface  Constraint  (ISC) 

The  inner  surface  constraint  restricts  the  relative  orien¬ 
tations  of  the  neighboring  points,  within  a  surface.  Con¬ 
sider  a  curve  C(t)  —  (x(t),y{t),  z(t))  on  a  C 2  surface  S. 
For  each  point  P  (E  C  associate  a  vector  R  €  Tp  such 
that 

^£.dNR=0  (10) 

where  Tp  is  the  tangent  plane  of  the  surface  S  at  the 
point  P  and  dNR  is  the  derivative  of  the  normal  N  of 
the  surface  S  in  the  direction  R. 

Theorem  6  Inner  Surface  Constraint:  Under  ortho¬ 
graphic  projection,  if  an  image  curve  Ci  is  the  projec¬ 
tion  of  the  curve  C  on  the  surface  S  and  Rj  —  (rx,  ry) 
is  the  projection  of  the  vector  R  satisfying  equation  10, 
then  the  change  of  the  orientation,  (p,q),  of  the  surface 
S,  along  the  curve  C,  in  the  p  —  q  space  is  restricted  by 
the  image  vector  Rj,  as: 

d(p,q)c  ■  Rj  -  0  (11) 

The  proof  of  the  theorem  is  given  in  appendix  A. 

To  apply  this  constraint,  we  need  to  identify  a  curve 
C  in  the  image  plane  for  which  the  orientation  R  can  be 
determined.  In  a  previous  paper  [Ulupinar  and  Nevatia, 
1990]  we  have  shown  that  for  zero  Gaussian  curvature 
surfaces  any  curve  on  the  surface  can  be  the  C  curve 
if  the  direction  R  is  chosen  to  be  the  direction  of  the 
rulings  of  the  surface.  Following  theorem  shows  how  we 
can  use  parallel  symmetric  curves  for  this  purpose  in  a 
general  case. 

Theorem  7  Lei  the  family  of  curves, {Ci},  be  on  a  sur¬ 
face  S  such  that  the  curves,  Ci ,  are  parallel  symmetric  in 
3-D.  If  the  curves  Ci  are  used  as  the  C  curves  of  equation 
10  then,  the  tangent  of  the  curves  obtained  by  joining  the 
symmetric  points  of  the  curves  Ci  gives  the  direction  R 
of  the  ISC.  Conversely,  if  the  curves  obtained  by  joining 
the  parallel  symmetric  points  of  curves,  Ci,  are  used  as 
C  curves  of  equation  10  then  the  tangents  of  the  curves 
Ci  gives  the  direction  R. 


Proof  Consider  the  parametric  representation  5(u,  t<) 
of  the  surface  S  such  that  the  u  parameter  curves  are 
parallel  symmetric  to  each  other  (the  {C<}  family  of 
curves)  and  v  parameter  curves  join  the  parallel  sym¬ 
metric  points  of  the  u  parameter  curves. 

For  the  first  pat  of  the  theorem  we  have  to  show  that 
equation  10  holds  or  with  the  current  parameterization 

Su  ■  Nv  =  0  (12) 

is  true,  where  N  =  is  the  unit  normal  of  the 


10  ttuV|  uuv<iv  v  —  j  £  |  *<'  uuiv  v* 

surface.  Note  that  N  ■  Su  =  N  ■  Sv  =  0  by  definition.  We 
can  substitute  —  Sa„  •  N  for  Su  ■  Nv  since: 


„  d(Su-N)  ^ 

U  —  ‘ 


N  +  Su  ■  Nv  =>  Su  ■  Nv  =  -Su 


S„  is  the  tangent  of  the  u  parameter  curves,  and  since 
the  v  parameter  curves  join  the  parallel  symmetric  points 
of  u  parameter  curves  the  direction  of  Su(u,  v)  is  inde¬ 
pendent  of  the  v  parameter,  that  is  Su(u,  v)  =  c(v)S„(u) 
where  c  is  a  scalar  function.  And; 


Suv  = 


c'(v)S„(u) 


By  substituting  this  in  equation  13  we  get 

Nv  ■  Su  =  -N  ■  Suv  =  -c'{v)(N  ■  5„(u))  =  0  (15) 

For  the  second  part  of  the  theorem  we  have  to  show 
that  Sv  •  Nu  =  0.  Using  equation  15  we  get: 

0  =  NV-SU  =  -N-S„V  =  -N-SVU  =  SV-NU  (16) 


3.3  Orthogonality  Constraint  (OC) 

The  two  previous  constraints  (CSBC  and  ISC)  are  not 
sufficient  to  determine  surface  orientations  uniquely.  To 
further  constraint  the  solution,  we  impose  an  additional 
constraint.  We  require  that  the  cross  sections  and  the 
meridians  of  a  surface  (as  defined  in  sections  2.3  and  2.4) 
be  mutually  orthogonal.  This  constraint  may  be  satisfied 
precisely  for  some  kinds  of  surfaces  but  is  not  necessarily 
true  for  all  surfaces;  in  the  latter  cases  we  maximize  a 
measure  of  orthogonality  (given  later).  This  constraint 
is  justified  on  perceptual  observations.  It  may  be  viewed 
as  being  equivalent  to  slicing  the  surface  along  merid¬ 
ians  and  cross  sections  to  obtain  thin  skew  symmetric 
planar  patches  and  assuming  that  these  patches  are  or¬ 
thogonally  symmetric  in  3-D,  as  in  Kanade’s  analysis  for 
polyhedra  [Kanade,  198 1].  The  orthogonality  of  two  vec¬ 
tors  A  and  B,  which  lie  on  a  plane  having  gradient  (p,  q) 
and  whose  images  are  A,  =  ( ax,ay )  and  Bi  —  (6x,6y), 
constrain  the  gradient  (p,  q)  with  the  equation: 

(ax,ay,pax  +  qay)  ■  (bx,by,pbx  +  qby)  =  0  (17) 


4  Analysis  of  ZGC  Surfaces 

We  have  applied  the  constraints  of  section  3  to  analy¬ 
sis  of  zero-Gaussian  curvature  surfaces  in  previous  work 
[Ulupinar  and  Nevatia,  1990].  We  provide  a  brief  sum¬ 
mary  of  this  work  as  the  techniques  for  the  PRCGCs  and 
SHGCs  are  related  to  it. 
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A  ZGC  surface  is  indicated  by  the  presence  of  a  par¬ 
allel  symmetry  and  a  mirror  symmetry  where  lines  of 
symmetry  are  straight.  The  parallel  symmetry  curves 
give  the  cross  sections  of  the  ZGC  and  the  lines  of  sym¬ 
metry  give  the  rulings.  For  a  ZGC,  it  is  necessary  only 
to  consider  one  cross  section  at  a  time,  as  the  surface 
orientations  can  simply  be  propagated  along  the  ruling. 

Suppose  we  wish  to  estimate  the  surface  orientation  at 
n  points  along  the  cross  section  (assumed  to  be  planar), 
we  have  2n  +  2  unknowns  (2n  for  n  points,  2  unknowns 
for  the  orientation  of  the  cross  section  itself).  ISC  and 
CSBC  together  provide  2n—  1  constraint  equations,  leav¬ 
ing  three  degrees  of  freedom  undetermined.  Introducing 
the  orthogonality  constraint  (in  this  case  requiring  the 
cross  section  and  rulings  to  be  orthogonal)  gives  an  ad¬ 
ditional  n  equations;  we  now  have  more  equations  than 
unknowns. 

These  equations  are,  however,  not  always  indepen¬ 
dent.  We  find  that  for  a  cylindrical  surface,  all  equations 
can  be  satisfied  exactly  and  still  one  degree  of  freedom 
remains  for  the  orientation  (pc,9c)  °f  the  cross  section 
plane  (it  is  constrained  to  be  on  a  line  parallel  to  the 
axis  of  the  cylinder  in  the  p  —  q  plane).  For  more  gen¬ 
eral  objects,  all  equations  can  not  be  solved  exactly.  We 
choose  to  satisfy  CSBC  and  ISC  exactly  and  minimize  a 
measure  of  orthogonality.  Unfortunately,  this  minimiza¬ 
tion  procedure  also  does  not,  in  general,  give  a  unique 
answer.  The  minimum  is  typically  achieved  when  (pc,  qc) 
is  along  a  line  in  the  gradient  space  and  the  variations 
are  too  small  along  this  line  to  pick  a  specific  value. 

This  last  degree  of  freedom  is  removed  by  using  the 
3-D  shape  of  the  cross  section  itself.  We  make  the  as¬ 
sumption  that  the  3-D  cross  section  should  be  as  com¬ 
pact  as  possible,  subject  to  the  limits  given  by  other 
constraints.  Our  method  to  accomplish  this  consists 
of  fitting  an  ellipse  to  the  cross  section  and  choosing 
that  orientation  that  gives  the  least  eccentric  ellipse  in 
the  back  projection  subject  to  the  orientation  satisfying 
other  constraints  (namely,  its  being  on  a  specific  line). 
Also,  we  apply  a  correction  to  this  estimate  depending 
on  the  quality  of  ellipse  fit  to  bias  the  answer  away  from 
highly  slanted  orientations.  This  algorithm  is  fully  de¬ 
scribed  in  [Ulupmar  and  Nevatia,  1990],  an  outline  is 
given  below. 

As  (pc,<?c)  is  constrained  to  be  on  a  line,  the  problem 
is  equivalent  to  estimating  only  one  parameter,  say  qc 
(without  loss  of  generality,  as  we  can  rotate  the  coordi¬ 
nate  system  as  necessary).  Steps  in  estimating  qc  are: 

1.  First  Estimation  ot  qc:  An  ellipse  is  fit  to  the  cross 
section  contour,  then  the  orientation  of  the  circle 
(pe,qe),  that  would  project  as  the  fitted  ellipse  is 
projected  on  the  q  axis,  on  the  p-  q  plane  to  obtain 
the  first  approximation  of  qc,  call  it  qe. 

It  may  be  necessary  to  segment  the  cross  section  if, 
it  is  complex  and  repetitive.  To  achieve  this,  the 
concavities  of  the  contour  are  found  and  matched. 
If  they  match  in  such  a  way  that  the  cross  section  is 
segmented  into  similar  pieces,  then  a  different  ellipse 
is  fit  to  each  piece  of  the  contour  and  average  of  the 
ellipses  is  used  to  estimate  qe. 

2.  Updating  qc:  The  purpose  of  this  updating  process 


is  to  simulate  the  bias  that  humans  have  in  orienting 
the  cross  section  toward  45°.  We  update  qe  to  ob¬ 
tain  the  final  qe  as  follows  (after  converting  qe  into 
degrees): 

qc  =  45°  +  A (qe  -  45°)  (18) 

Where  A  is  a  confidence  factor  in  the  range  [0, 1]  and 
is  a  function  of  how  well  the  ellipse  approximates 
the  cross  section  curve.  In  our  implementation  it  is 
given  by  : 

A(£)  =  (1-s2)  (19) 

Where  c  is  the  ellipse  fit  error  (in  range  [0,1]). 

The  algorithm  derives  from  our  observations  of  human 
perception  and  we  have  validated  it  by  an  extensive  com¬ 
parison  with  human  subjects. 

The  described  method  for  recover  ZGC  surfaces  from 
image  contours  has  been  tested  on  a  number  of  examples 
(we  assume  that  symmetries  are  given)  and  produces 
results  that  appear  consistent  with  human  observation. 

5  Quantitative  Shape  Recovery  of 
SHGC  surfaces 

To  compute  the  shape  of  an  SHGC  along  each  recov¬ 
ered  cross  section  curve  we  can  apply  the  constraints 
discussed  in  section  3  as  they  are  applied  to  a  ZGC  sur¬ 
face  in  section  4.  For  the  following;  say  that  there  are  m 
cross  section  curves  and  we  would  like  to  compute  the 
orientation  of  the  surface  n  points  along  a  cross  section. 
Then  we  have  2 nm  unknowns,  initially,  corresponding  to 
the  gradient  (p,  q)  of  the  surface  at  nm  points. 

CSBC  The  curved  shared  boundary  constraint  applies 
between  the  orientation,  (pc,9c)>  of  the  cross  section 
curves  Cj  and  the  orientation,  [pi,qi)  of  each  of  the  point 
on  the  surface  along  a  cross  section.  Note  that  (pc,?c)  is 
the  same  for  all  cross  section  curves.  The  curved  shared 
boundary  states  that  the  line  in  the  p  —  q  space  from  the 
gradient  (pi,qi)  of  a  point  Pi  G  Cj  to  the  gradient  (pc,  qc) 
of  the  cross  section  plane  is  orthogonal  to  the  tangent, 
C'j(Pi),  of  the  cross  section  Cj  at  point  Pi.  Then  the 
constraint  equation  is: 

(Pe  ~  P„9c  ~  9.)  •  C'j[P>)  =  0  VP,  (20) 

This  provides  n  constraints  along  each  cross  section 
curve. 

ISC  Inner  surface  constraint  is  applied  along  a  cross 
section  using  the  tangents  of  the  meridians  at  each  point. 
The  theorem  7  indicates  that  ISC  is  applicable  along 
the  cross  section  curves  because  cross  section  curves 
are  parallel  symmetric  by  theorem  4  with  the  merid¬ 
ian  curves  joining  the  parallel  points  of  the  cross  sec¬ 
tion  curves.  Inner  surface  constraint  states  that  change 
of  the  orientation  (pt+t  -  p,,  ?,+i  -  q< )  of  the  surface 
along  a  cross  section  curve  C;  between  two  consecutive 
points  Pi,Pi  + 1  G  Cj  must  be  orthogonal  to  the  tangent 
^i  +  i/2(^'+i/2)  of  the  meridian  that  passes  through  the 
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point  P ,+i/2  €  Cj  which  is  in  the  middle  middle  of  the 
points  Pi  and  Pi+\ .  Then  the  constraint  equation  is: 

(p»+j - P*1  9*+l  VPiPi+\flPi+\  €  Cj 

(21) 

Application  or  ISC  provides  n  — 1  equations  for  each  cross 
section  curve. 

There  are  2 n  unknowns  for  each  cross  section  curve, 
and  two  more  unknowns  for  the  whole  SHGC,  the 
(Po?c),  by  combining  the  two  constraints,  we  have  2n  — 1 
constraints  for  each  cross  section.  Then  for  each  cross 
section  there  are  three  degrees  of  freedom  as  in  the  case 
of  a  ZGC  surface  discussed  in  section  4. 


Planarity  of  Meridians  The  meridians  of  an  SHGC 
are  planar  as  discussed  in  section  2.3.  Then  the  shared 
boundary  constraint  can  be  applied  along  a  meridian 
curve  as  if  the  curve  is  obtained  by  cutting  the  sur¬ 
face  of  the  SHGC  with  a  plane  along  the  meridian.  The 
shared  boundary  constraint  is  applied  along  a  meridian, 
M,  between  the  gradient,  (pm,qm),  of  the  plane  that 
the  meridian  M  rests  on  and  the  gradient  (pj,qj)  of  the 
points  Pj  e  M,  using  the  tangent,  M'(Pj)  of  the  merid¬ 
ian  curve  at  each  point  Pj  e  Af .  The  constraint  equation 
is  : 


(Pm  -  Pj,qm  -  qj)  •  M'(Pj)  =  0  VP,  e  M  (22) 
Enforcing  one  meridian  curve  to  be  planar  automatically 
makes  the  others  to  be  planar  too.  Therefore,  the  pla¬ 
narity  is  applied  only  to  one  of  the  meridians,  giving  m 
constraint  equations  with  the  expense  of  two  additional 
unknowns. 

In  total  there  are  now  2nm+4  unknowns,  2nm  for  the 
(p,q)  of  nm  points  on  the  surface,  two  for  (pc,9c),  two 
more  for  (pra,  qm),  and  there  are  2 nm  constraint  equa¬ 
tions,  nm  from  the  CSBC  between  the  cross  sections  and 
the  face  of  the  surface,  m(n  -  1)  from  the  ISC,  and  m 
from  the  CSBC  of  a  meridian  curve.  That  is  there  are 
four  degrees  of  freedom  for  recovering  the  orientation  of 
all  the  points  on  an  SHGC.  These  four  degrees  of  freedom 
corresponds  to  the  orientation,  (pc,qc),  of  the  cross  sec¬ 
tions  and  the  orientation,  (pm,  qm),  of  the  plane  contain¬ 
ing  the  chosen  meridian.  Without  any  assumptions  we 
could  arbitrarily  set  these  four  variables  and  get  a  valid 
reconstruction  of  the  SHGC  that  would  project  like  the 
figure  in  the  image  plane.  However  not  all  of  these  recon¬ 
struction  look  natural  to  humans  when  they  observe  the 
image  of  the  contours  of  an  SHGC.  Humans  prefer  some 
interpretations  over  the  others.  In  the  following  section 
we  propose  orthogonality  as  the  preference  criteria. 


5.0.1  Orthogonality 

For  SIIGCs  we  use  the  orthogonality  of  the  3-D  tan¬ 
gents  of  the  cross  sections  and  the  meridian  curves,  mak¬ 
ing  each  little  patch,  formed  by  dividing  the  surface 
along  meridians  and  the  cross  sections,  orthogonal.  We 
can  apply  the  orthogonality  constraint  using  the  equa¬ 
tion  given  in  equation  17.  This  constraint  is  not  al¬ 
ways  exactly  satisfied,  except  for  surfaces  of  revolution. 
Therefore  we  perform  a  minimization  of  the  second  or¬ 
thogonality  constraint  as: 


h  =  ££cos(m  = 

»  t 


(23) 


where  (Ci(P,j))'3  and  (M<(1%))3  are  the  3-D  tangents  of 
the  cross  section  and  meridian  curves  at  point  Pij.  These 
3-D  tangents  are  dependent  on  their  2-D  tangents  on  the 
image  and  on  the  orientation  (p,7,  q,j)  of  the  surface  at 
point  P^  as  given  by  the  equation  17.  The  gradients 
(pij i  Qij )  at  each  point  is  dependent  on  the  four  variables, 
(pc.qc)  and  (Pm.7m),  discussed  in  the  previous  section. 
We  would  like  to  minimize  the  function  E  for  ( pc,qc ) 
and  (pm,  qm).  However  from  our  experiments  we  observe 
that  minimization  of  E  chooses  values  that  are  always 
consistent  with  the  assumption  that  the  3-D  axis  of  the 
SHGC  is  orthogonal  to  its  cross  section. 

If  we  enforce  the  cross  sections  to  be  orthogonal  to  the 
axis  of  the  SHGC,  the  orientation  (pc,9c)  of  the  cross 
section  lies  along  a  line  in  the  p  —  q  space  that  passes 
through  the  origin  and  is  in  the  direction  of  the  image 
of  the  axis  of  the  SHGC.  This  constraint  also,  in  efTect, 
enforces  the  gradient  (pm,  gm)  of  the  plane  of  the  merid¬ 
ians  to  be  orthogonal  to  the  gradient  (pc,  qc )  of  the  cross 
sections.  That  is  : 

(Pm  ,  t  1)  '  (Pc^fc)  1 )  ■—  0  (24) 

For  simplicity,  say  the  coordinate  system  is  rotated 
such  that  the  image  of  the  axis  of  the  SHGC  is  aligned 
with  the  y  axis  of  the  coordinate  system.  Then,  we  have; 
Pc  —  0  from  the  orthogonality  of  the  axis  to  the  cross  sec¬ 
tion  and  qm  =  —  1  /qc  from  equation  24.  The  parameters 
Pm  and  qc  are  the  free  variables  to  be  fixed  by  mini¬ 
mizing  the  function  S.  However,  the  minimum  of  the 
function  S  does  not  fix  the  variable  qc  (except  for  sur¬ 
faces  of  revolution).  Either  the  function  forms  a  valley 
along  qc  making  any  choice  as  good  as  any  other  or  fixes 
qc  to  be  zero  which  is  not  a  realistic  solution.  We  use 
the  same  method  for  estimating  qc  as  described  for  ZGC 
surfaces  in  section  4. 

5.1  Results 

We  have  implemented  the  constraints  discussed  in  the 
previous  section  in  a  somewhat  reverse  order.  For  an 
SHGC  whose  axis  is  aligned  with  the  y  axis  of  the  coor¬ 
dinate  system  the  method  is  as  follows;  First  the  ellipse 
fit  algorithm  is  applied  to  compute  qc ,  then  the  function 
.s  is  minimized  to  compute  pm.  Then  the  surface  is  con¬ 
structed  using  the  constraints  discussed  in  section  5  to 
compute  the  surface  orientation  at  each  point.  Figure  11 
shows  the  needle  images  and  the  shaded  images,  with  the 
computed  surface  orientations,  of  the  SHGCs  in  figure  1. 

6  Quantitative  Shape  Recovery  of 
PRCGC  Surfaces 

Here  we  discuss  the  application  of  the  three  constraints 
discussed  in  section  3  along  a  cross  section  curve  of  a 
PRCGC,  to  recover  the  surface  orientation  of  a  PRCGC. 

CSBC  The  shared  boundary  constraint  can  be  applied 
along  the  image  of  a  cross  section  curve.  Let  (p0qc)  be 
the  gradient  of  the  plane  that  contains  the  cross  section 
curve,  C(u),  whose  image  is  the  image  curve  C,(u)  = 
(c*(u).  cy(v)).  Let  (p(u),q(u))  be  the  orientation  of  the 
points  along  the  cross  section  curve  C(u).  Then  the 
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Figure  11:  The  needle  images  and  the  shaded  images  generated  with  the  computed  gradients  at  each  point  of  the 
SHGCs  in  figure  1 


shared  boundary  constraint  is: 

(Pc  -  P(«),  qc  ~  ?(«))  •  («£(«),  <£(*))  =  0  (25) 

ISC  Theorem  7  indicates  that  ISC  is  applicable  along 
the  cross  sections  of  a  PRCGC  because  cross  sections 
of  a  PRCGC  join  the  parallel  symmetric  points  of  the 
meridian  curves  which  are  parallel  symmetric  as  given  by 
lemma  1.  Since  the  meridians  of  a  PRCGC  are  parallel 
symmetric  with  cross  section  curves  forming  the  corre¬ 
spondence,  the  tangent  vectors  of  the  meridians  along  a 
cross  section  is  a  constant  vector  which  is  also  parallel 
to  the  axis  of  the  PRCGC  as  given  by  equation  7.  Let 
the  tangent  direction  of  the  meridians  along  the  cross 
section  C(u)  be  A'  and  its  image  be  A[  =  ( ax,a'y ),  note 
that  A\  is  independent  of  the  u  parameter.  For  the  sake 
of  simplicity  let  us  assume  that  the  coordinate  system  is 
rotated  such  that  A\  is  along  the  y  axis  of  the  coordinate 
system,  then  a'x  =  0.  The  inner  surface  constraint  is  : 

^(p(«).g(«))  •  (ai.ay)  =  0  =>  q(u)'a'y  =  0  =>  q(u)  =  q0 

(26) 

By  combining  this  constraint  with  the  CSBC  given  in 
equation  25  we  get: 

,  x  <(«)(7c-?o)  . 

=  - +P.  (27) 

Orthogonality  The  last  constraint  is  the  orthogonal¬ 
ity  of  the  meridians  to  the  cross  section  curves.  The 
reader  can  easily  verify  that  the  u  and  t  parameter  curves 
in  equation  3  are  orthogonal  to  each  other  for  all  points 


on  the  surface  S  of  the  PRCGC.  Then,  we  use  the  or¬ 
thogonality  by  enforcing  the  tangent  of  the  meridians 
A',  whose  image  is  A\  =  (0,a^)  to  be  orthogonal  to  the 
tangent  of  the  cross  section  curve  C,  whose  image  is 
C,(u)  =  (cx(u),c„(u)),  at  a  point  on  the  surface  whose 
gradient  is  (p(u),  go): 

(0,Oy,g0a' )*(c' (u).c' (u),p(u)c' (u)-t-q0Cy(u))  =  0  (28) 

By  substituting  p(u)  given  in  equation  27  in  the  above 
equation  we  get: 

OyCy(u)(l  +  q0qc)  +  ayPcc'x(u)  =  0  (29) 

Since  the  above  equation  is  zero  for  all  values  of  u  we 
get  both  pc  =  0  and 

1  +  q<>qc  =  0  =>  go  =  - \/qc  (30) 

Fixing  qc  fixes  the  orientation  of  the  surface  along  the 
cross  section  C  together  with  the  gradient  (p0gc)  (which 
is  (0i9c)  in  the  rotated  coordinate  system)  of  the  plane 
containing  C.  However  our  constraint  equations  do  not 
constrain  qc. 

6.1  Recovering  Cross  Section  Curves 

In  the  previous  section  we  have  discussed  how  to  recover 
the  surface  orientation  at  each  point  on  a  cross  section 
curve  given  the  image  of  the  cross  section  curve.  How¬ 
ever  it  is  not  directly  possible  to  replicate  the  images  of 
the  cross  section  curves  of  a  PRCGC,  such  as  the  ones 
in  figure  2,  except  for  the  ends  of  the  PRCGC,  where 
we  assume  the  cross  section  curve  is  given.  That  is  we 
assume  that  the  surface  is  cut  along  its  cross  sections. 
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Here  we  discuss  a  method  for  recovering  the  cross  sec¬ 
tions  when  one  or  both  ends  of  the  PRCGC  are  avail¬ 
able,  the  method  also  enables  us  to  reconstruct  the  3-D 
PRCGC  from  the  image  of  it. 

At  one  end  of  the  PRCGC  let  the  image  of  the  end 
cross  section  curve  be  C,(tt)  =  (cx(u),  Cj,(u))  and  the  im¬ 
age  of  the  axis  be  A*(t)  =  (ax(t),  Oy(t))  as  in  the  previous 
section.  The  image  of  the  axis  at  the  point  it  intersect 
the  cross  section  C  is  Ai(0)  =  (ax(0), a^O)).  Say  the 
coordinate  system  is  rotated  such  that  a'x(0)  =  0,  and 
the  orientation  (pc ,  qc )  of  the  plane  containing  the  cross 
section  curve  C  is  computed  using  the  constraints  dis¬ 
cussed  in  section  6  with  pc  =  0.  The  orientation  of  the 
points  along  the  cross  section  curve  C  is  (pu,9o)  where 
q0  —  —  1  /qc  and  p(u)  is  given  by  equation  27.  Since  the 
meridian  curves  are  parallel  symmetric  to  the  axis  of  the 
PRCGC  we  can  use  the  gradient  (p(tt),9o)  to  recover  the 
tangent  of  the  3-D  axis  at  t  =  0  as: 

A'(0)=K(0),o;(0),p(uK(0)  +  9o<(0))  =  (31) 

(o,a;(o),-^)  =  (o,9eii) 

9c 

That  is  A'(0)  is  parallel  to  normal,  (0, <jc,  1),  of  the 
plane  containing  the  cross  section  C,  or  the  plane  IIa 
containing  A'( 0)  is  orthogonal  to  the  plane  of  C.  Also 
since  the  axis,  A,  of  the  PRCGC  is  planar  the  plane  IIa 
contains  the  whole  axis  curve  A. 

In  the  following  we  give  an  algorithm  for  recovering 
the  3-D  cross  sections  from  the  image  of  a  PRCGC  given 
the  gradient  (pa,  qa)  of  the  plane  Ha  containing  the  axis. 
Then  in  the  next  subsection  we  give  a  method  for  com¬ 
puting  (p„,9 o)  from  the  image. 

The  gradient  (pe,qc)  of  the  plane  of  the  cross  sec¬ 
tion  C  can  be  computed  if  the  gradient  (p0,9a)  of  Ha  is 
given.  The  gradient  (pc,  qc)  must  lie  on  aline  that  passes 
through  the  origin  and  in  the  direction  of  A,(0),  in  our 
case  pc  =  0,  and  (pc,qc » 1)  is  orthogonal  to  (p01gail) 
then: 

(0,  qc .  1)  •  (Pa,  ?a 1 1)  =  0  ^  Qc  —  (32) 

qa 

We  can  compute  the  3-D  cross  section  C  from  the 
image  Ci  of  i'  by  backprojecting  C to  a  plane  having 
gradient  (pc,qc). 

If  the  cross  section  is  rotationally  symmetric2  the  al¬ 
gorithm  for  recovering  cross  sections  is  much  simpler.  In 
the  following  "e  give  an  algorithm  that  applies  to  gen¬ 
eral,  not  necessarily  rotationally  symmetric  case. 

It  can  be  shown  that  the  image  of  the  axis  of  the 
PRCGC,  Aj(t),  is  not  always  the  same  as  the  axis,  Bi(t), 
of  the  parallel  symmetry  of  the  image  of  the  limb  edges, 
where  the  axis  of  the  PRCGC  is  the  trace  of  a  single  point 
on  the  cross  section  as  the  cross  section  is  swept.  This  is 
shown  in  figure  12.  However  the  image  curves  A<(<)  and 
Bi(t)  are  always  parallel  symmetric  to  each  other  such 
that  the  corresponding  points  are  on  the  same  cross  sec¬ 
tion.  By  using  lemma  1  and  theorem  5  we  conclude  that 
the  images  of  the  limb  edges  are  parallel  symmetric  to 

*A  planar  cross  section  is  rotationally  symmetric  iff  the 
lines  passing  through  the  center  of  the  cross  section  intersects 
both  sides  of  the  cross  section  at  equal  distances. 


Figure  12:  A  PRCGC  with  a  non-rotationally  symmetric 
cross  section. 


Figure  13:  A  PRCGC  with,  (a)  none,  (b)  one,  and  (c) 
both  end  cross  sections  available. 


each  other  (and  of  course  to  its  axis)  as  well  as  to  the  im¬ 
ages  of  the  meridians  of  the  surface,  and  meridians  of  the 
surface  are  parallel  symmetric  to  the  axis  of  the  PRCGC 
by  equation  7,  so  are  their  images.  Therefore  the  axis  of 
the  image  of  the  limb  edges,  the  B<(t)  curve,  is  parallel 
symmetric  to  the  image  of  the  axis  of  the  PRCGC,  the 
A,(t)  curve. 

If  we  take  the  axis  A  of  the  PRCGC  as  the  trace  of 
the  point  that  is  the  backprojection  of  2?;(0)  to  the  cross 
section  plane  C.  Then  Ai(0)  =  Bt(0).  Given  the  orien¬ 
tation  (pa,9a) .of  the  plane  II0  containing  the  axis  A,  to 
recover  the  3-D  cross  section  say  at  point  P{  on  the  im¬ 
age  axis  Bi\  The  backprojected  C  of  Ci  is  rotated  by  the 
rotation  matrix  R(B'(0),  B'(P))  to  obtain  the  3-D  cross 
section  curve  Cp(u)  at  point  P,  where  B'( 0)  and  B'(P) 
are  obtained  by  backprojecting  B-( 0)  and  B-(pi)  onto 
the  plane  na.  Then  the  points  P\  and  Pi  that  produces 
the  limb  edge  on  the  cross  section  Cp(u)  is  identified  by 
equating  the  image  tangents  of  Cp(u)  to  the  image  tan¬ 
gent  of  limb  boundaries  Pi  and  Pi.  The  position  of  the 
cross  section  Cp  in  3-D  such  that  Cp(Pi)  and  Cp(Pi) 
project  as  the  points  Pi  and  Pj  on  the  image  and  the 
point  Pp  on  Cp  that  corresponds  to  the  point  A(0)  on 
C  is  on  the  plane  IIa,  gives  the  relative  position  of  the 
cross  section  Cp  with  respect  to  end  cross  section  C  in 
3-D. 


in  figure  2. 
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6.2  Computing  (pa,9a) 

The  gradient  (p»,qa)  of  the  plane  IIa  containing  the 
axis  is  computed  by  performing  a  search  in  the  gradi¬ 
ent  plane.  The  objective  of  the  search  is  to  compute 
(Pa ,9«)  that  gives  a  valid  reconstruction.  A  valid  con¬ 
struction  is  one  that  makes  the  projection  of  the  cross 
section  points  Cp(Py)  and  Cp{Pi)  exactly  the  same  as 
the  points  P\  and  on  the  image  plane.  We  form  an 
objective  function  which  is  the  average  distance,  on  the 
image  plane,  of  the  reconstructed  and  projected  point 
Cp(Pi)  to  the  point  P2  when  Cp[Pi)  and  P\  is  aligned 
exactly.  Then  this  objective  function  is  minimized  for 
(Pa,9a)- 

The  search  is  facilitated  by  finding  a  good  initial  point 
for  (pa,9a)  using  the  shapes  of  the  end  cross  sections. 
The  analysis  in  section  6  show  that  the  gradient  (pe,qc) 
of  the  cross  section  at  one  end  is  constraint  to  be  on 
a  line  in  the  gradient  space.  A  particular  value  on 
that  line  may  be  chosen  by  using  the  ellipse  fit  dis¬ 
cussed  in  section  4.  Similar  analysis  applies  to  the 
other  end  of  the  PRCGC  (if  available).  Say  the  ori¬ 
entation  of  the  plane  containing  the  other  end  cross  sec¬ 
tion  Cn  is  (p„,q„).  Then  the  plane  of  Cn  is  orthogonal 
to  the  plane  no.  If  (pn,9n)  is  not  equal  to  (0,gc)  we 
can  compute  an  initial  normal  Na  =  (pa,9o,l)  of  IIa 
as  Na  =  (pn.9n.l)  x  (0,qe,l).  If  the  other  end  cross 
section  Cn  is  not  available  then  the  gradient  (pa ,  qa ) 
is  constrained  to  be  on  a  line  by  its  orthogonality  to 
(0, qc).  The  equation  of  the  line  containing  (pa ,  qa )  is 
(0,9c  1)  -  (Pa,  9a,  1)  =  o.  Any  particular  value  of  (pa,  qa) 
may  be  chosen  on  this  line  as  the  initial  (pa,9o)-  Fig¬ 
ure  13  shows  that  perception  is  more  definite  when  both 
ends  are  available,  which  confirms  the  above  observation 
that  two  ends  are  more  informative  than  one  only. 

6.3  Results 

We  have  implemented  the  cross  section  recovery  method 
described  in  section  6.1.  In  the  implementation  first 
the  orientations  (pc,9c)  and  ( p«,9„ )  of  the  end  cross 
sections  are  computed.  Then  the  normal  Na  of  IIa 
is  found  by  searching  around  the  gradient  given  by 
(Pc,9c,  1)  x  (pn , 9n ,  1)  gives  a  valid  reconstruction. 
The  3-D  position  of  each  cross  section  is  then  found  by 
translating  the  end  cross  section  rotating  and  aligning  it 
with  the  limb  boundaries  and  the  plane  of  the  axis  IIa. 
Figure  14  shows  the  recovered  cross  sections  and  figure 
15  shows  the  recovered  orientations  by  both  needle  and 
shaded  images  for  the  PRCGCs  given  in  figure  2. 

7  Conclusions 

In  this  paper  we  have  analyzed  two  class  of  objects; 
Straight  Homogeneous  Generalized  Cones  (SHGCs)  and 
Planar  Right  Constant  cross  section  Generalized  Cones 
(PRCGCs). 

We  show  the  property  of  the  limb  boundaries  of 
SHGCs,  under  both  orthographic  and  perspective  pro¬ 
jection,  that  the  tangents  of  the  images  of  the  limb 
boundaries,  if  extended  from  the  points  on  the  same 
cross  section,  intersect  the  image  of  the  axis  of  the  SHGC 
at  the  same  point.  We  also  show  that  the  cross  sections 


of  an  SHGC  are  parallel  symmetric,  and  use  that  prop¬ 
erty  to  recover  the  images  of  the  cross  sections  of  the 
SHGC.  Then  we  apply  the  constraints,  curved  shared 
boundary  constraint  (CSBC),  inner  surface  constraint 
(ISC),  and  the  orthogonality  constraint  (OC)  to  the 
SHGCs.  An  SHGC  has  four  degrees  of  freedom  if  it  is 
to  be  recovered  from  the  images  of  its  contours  without 
any  assumptions.  With  the  assumption  of  orthogonal¬ 
ity  there  is  only  one  degree  of  freedom  which  is  fixed 
by  estimating  the  orientation  of  the  cross  sections  with 
an  ellipse  fit  algorithm.  Some  computational  results  are 
shown  on  synthetic  data. 

For  PRCGCs  the  limb  boundaries  are  shown  to  project 
as  parallel  symmetric  curves,  which  enable  us  to  find 
points  on  the  limb  boundaries  that  correspond  to  the 
same  cross  section.  We  also  show  that  the  three  con¬ 
straints,  CSBC,  ISC  and  OC,  are  applicable  along  the 
cross  section  of  a  PRCGC.  We  applied  the  constraints 
to  the  ends  where  the  cross  sections  are  available.  Then 
we  present  an  algorithm  to  reconstruct  the  3-D  PRCGC 
from  the  images  of  its  contours,  using  the  ellipse  fit 
method  to  recover  the  orientations  of  cross  sections  at 
the  ends. 

We  have  assumed  that  the  object  boundaries  and  sym¬ 
metries  are  given.  Detection  and  computation  of  sym¬ 
metries  may,  in  itself,  be  a  difficult  task  in  real  images. 
However,  we  do  provide  tests  that  can  be  used  to  verify 
symmetry  properties.  Also,  we  believe  that  3-D  shape 
recovery  process  will  serve  as  an  aid  in  segmentation  and 
boundary  labelling  process  as  well.  In  the  future,  we 
hope  to  explore  this  aspect  of  the  problem. 


Appendix 

A  Proof  of  the  theorem  6 


Let  X(u,  v)  be  the  local  parameterization  of  the  surface  S 
around  the  point  P  6  C(f)  such  that  for  P  —  X{uq,  t>o), 
the  curve  X(u,  Vo)  is  the  curve  C  and  the  curve  X («o,  v) 
is  in  the  direction  R.  That  is,  u  parameter  curve  is 
along  the  curve  C  and  v  parameter  curve  is  in  the  di¬ 
rection  R  at  the  point  P.  Here  we  have  to  show  that 
■  Rj  =  0  where  (p,q)  is  the  normal  of  the  surface 
in  the  gradient  space,  du  is  in  the  direction  of  C' ,  Rj  is 
the  image,  (xv,yv),  of  the  vector  R  =  Xv  =  (x,,,  y„,  zt.) 
under  orthographic  projection. 

Normal,  N,  of  this  surface  at  any  point  is  given  by: 


N  = 


|  Xu  x  Xv  | 


(33) 


Then,  the  functions  dC/dt  and  dN R  are: 


dC  _  dX 
dt  ~  dv  ~  '  v 
dN 

dNR  =  —  = 


(34) 

(35) 


By  equation  10  we  have  Xv  ■  Nu  =  0.  Let  the  normal  N 
of  the  surface  around  point  P  is  represented  in  the  (p,  9) 
space  as  N  =  c(p,q ,  1).  Where  c  is  the  scale  coefficient 
and  equal  to  (p2  +92  +  l)-1/2.  Differentiation  of  N  with 
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Figure  15:  The  recovered  orientations  shown  by  both  needle  image  and  by  shading  the  objects  for  the  PRCGCs  in 
figure  2. 

respect  to  the  parameter  it  gives: 

Nu  =  c«(p,q,  1)  +  c(pu,qu,0)  (36) 

=  —  W-f  c(pu,gu,0) 

c 

If  we  set  Xv  •  Nu  —  0  where  Xv  =  (x„,  yv,  zv)  and  Nu 
is  given  in  equation  36  we  get: 

Xv  ■  JVU  =  —— Xv  •  N  +  c(x#  i  y»  t  zv)  ■  (pu ,  0)  =  0  (37) 

We  also  have  N  ■  X v  =  0  from  33.  Therefore 

*»Pu  +  y*9u  =  0 

^  ■  R/  =  0  (38) 

du 

O 
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Abstract 

We  describe  a  methodology  for  developing  effi¬ 
cient  parallel  implementations  of  high  level  vi¬ 
sion  algorithms.  Efficiency  is  defined  in  terms 
of  algorithm  speedup,  processor  efficiency,  sys¬ 
tem  complexity,  and  programmer  burden.  Al¬ 
gorithm  speedup  and  processor  efficiency  are 
critical  issues  in  the  parallel  implementation 
of  high  level  vision  tasks  as  the  required  algo¬ 
rithms  often  utilize  computationally  intensive 
techniques.  Furthermore,  due  to  their  usage 
of  complex  code  and  data  structures,  system 
complexity  and  maintenance  costs  can  become 
excessive  if  care  is  not  taken  in  the  design  of 
the  implementation.  Most  researchers  empha¬ 
size  speedup  and  efficiency  with  little  regard 
to  system  complexity  and  programmer  burden. 

We  show  that  through  our  design  procedure,  all 
four  issues  can  be  sufficiently  addressed. 

1  Introduction 

Computer  vision  systems  are  comprised  of  tasks  that 
can  be  categorized  into  three  levels,  low,  mid,  and 
high.  Across  the  levels,  a  wide  variety  of  algorithmic 
techniques  are  utilized  ranging  in  complexity  from  sim¬ 
ple  repetitive  processing  to  elaborate  rule-based  control 
structures.  Also,  the  amount  of  active  data  at  any  given 
point  in  the  system  execution  can  range  from  tens  of 
thousands  of  individual  scalar  values  to  a  few  multi-field 
record  structures.  Each  diverse  algorithm  utilized  in  a 
computer  vision  system  taxes  a  classical  von  Neumann 
(serial)  architecture  in  one  way  or  another. 

In  low-level  vision,  the  multiplications  and  additions 
required  by  convolutional  processing  are  easily  executed 
on  a  serial  machine  but  the  amount  of  data  on  which 
they  must  operate  (the  image  plane)  overwhelm  it. 
Conversely,  the  small  number  of  abstract  data  struc¬ 
tures  utilized  in  high-level  vision  are  easily  maintained 
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by  a  von  Neumann  machine  but  the  amount  of  pro¬ 
cessing  and  the  complex  control  structures  required  to 
search  through  a  solution  space  containing  permuta¬ 
tions  of  the  data  soon  exceed  the  capabilities  of  the 
machine.  To  summarize,  computer  vision  systems  chal¬ 
lenge  serial  machines  through  both  data  intensive  and 
compute  intensive  operations.  These  challenges  have 
made  parallel  implementation  of  computer  vision  sys¬ 
tems  an  important  topic  within  the  computer  vision  re¬ 
search  community  [Ahuja  and  Swamy,  1984,  Weems  and 
Levitan,  1987,  Kuehn  et  ai,  1985,  Little  et  ai,  1987, 
Rosenfeld  et  ai,  1986,  Harney  et  al.,  1988,  Stout,  1988, 
Sunwoo  and  Aggarwal,  1989], 

We  are  interested  in  investigating  the  inherent  com¬ 
plexities  of  computer  vision  systems  and  how  those  com¬ 
plexities  can  be  tolerated  via  efficient  use  of  a  parallel 
processor  architecture.  We  define  efficient  in  terms  of 
four  measures. 

Algorithm  Speedup  is  a  measure  of  the  reduction  in  ex¬ 
ecution  time  when  moving  from  a  sequential  to  a  parallel 
algorithm  implementation.  This  is  a  standard  measure 
in  the  study  of  parallel  processing. 

Processor  Efficiency,  also  referred  to  as  load  balanc¬ 
ing,  is  a  measure  of  the  amount  of  inherent  parallelism, 
or  conversely,  the  amount  of  inherent  serialism,  within 
an  algorithm  as  well  as  how  well  suited  the  target  par¬ 
allel  processor  architecture  is  to  the  algorithmic  require¬ 
ments.  This  too  is  a  standard  measure  in  the  study  of 
parallel  processing. 

System  Complexity  is  a  measure  of  how  closely  the  par¬ 
allel  implementation  of  the  algorithm  resembles  the  serial 
implementation,  or  conversely,  how  closely  it  resembles 
the  parallel  processor  architecture.  This  is  a  measure 
that  we  are  introducing  as  it  plays  an  important  role  in 
the  life  cycle  of  a  computer  system,  both  software  and 
hardware. 

Programmer  Burden  is  a  measure  of  the  degree  of  dif¬ 
ficulty  in  developing  and  maintaining  the  parallel  algo¬ 
rithm  implementation.  This  is  also  a  measure  that  we 
are  introducing  as  it  too  plays  an  important  role  in  the 
life  cycle  of  a  computer  system. 

An  abundance  of  research  into  the  parallel  implemen¬ 
tation  of  low  and  mid-level  vision  tasks  on  a  variety  of 
machines  has  been  performed  [Rice  and  Jamieson,  1985, 
Little  et  al. ,  1987,  Kuehn  et  ai,  1985,  Stout,  1988, 
Levitan,  1984,  Weems,  1988,  Hamey  et  ai,  1988]  but  lit- 
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tie  has  been  done  with  respect  to  high-level  vision.  Fur¬ 
thermore,  researchers  have  placed  dramatic  emphasis  on 
the  issues  of  algorithm  speedup  and  processor  efficiency, 
especially  speedup,  with  little  or  no  regard  to  system 
complexity  and  programmer  burden.  Typically,  the  de¬ 
rived  parallel  implementations  provide  good  measures  of 
speedup  and  efficiency  at  the  cost  of  obscure  software 
and  costly,  custom  built  hardware. 

In  our  approach,  rather  than  select  a  parallel  architec¬ 
ture  then  map  an  algorithm  onto  it,  as  is  usually  done, 
we  perform  some  basic  analysis  steps  in  order  to  identify 
the  inherent  parallelism  contained  within  the  algorithm. 
We  then  specify  the  components  of  a  parallel  processor 
architecture  that  is  well  suited  to  the  requirements  of 
the  algorithm.  For  a  complete  computer  vision  system 
comprised  of  a  variety  of  algorithms,  we  specify  an  ar¬ 
chitecture  for  each  algorithm  that  is  well  suited  to  that 
algorithm.  These  architectures  can  then  be  realized  by 
either  a  single  heterogeneous  or  reconfigurable  parallel 
processor  architecture.  Through  this  approach  we  are 
able  to  address  the  issues  of  system  complexity  and  pro¬ 
grammer  burden  as  well  as  algorithm  speedup  and  pro¬ 
cessor  efficiency. 

Due  to  the  need  for  increased  through-put  in  high-level 
vision  algorithms  and  the  lack  of  research  towards  this 
end  as  well  as  the  abundance  of  results  available  in  the 
parallel  implementation  of  low  and  mid-level  vision  algo¬ 
rithms,  our  studies  are  centered  around  high-level  vision. 
In  applying  our  approach  to  the  parallel  implementation 
of  a  relaxation  based  image  matching  algorithm  [Medioni 
and  Nevatia,  1984]  we  were  able  to: 

•  Achieve  significant  algorithm  speedup. 

•  Achieve  significant  processor  efficiency. 

•  Design  a  parallel  processor  architecture  consisting 
of  commercially  available  components. 

•  Utilize  software  that  is  nearly  identical  to  that  used 
in  the  serial  implementation. 

In  the  following  sections  we  present  our  methodology 
for  developing  parallel  implementations  of  computer  vi¬ 
sion  algorithms  and  the  application  of  the  methodology 
to  the  relaxation  based  image  matching  algorithm. 

2  The  Methodology 

Research  into  the  parallel  implementation  of  computer 
vision  systems  classically  begins  with  the  specification 
of  a  parallel  processor  architecture  [Little  et  al.,  1987, 
Stout,  1988,  Reisis  and  Prasanna-Kumar,  1987].  This 
includes  the  specification  of  various  organizational  pa¬ 
rameters  such  as:  Programming  model,  SIMD,  MIMD,  or 
MISD  [Flynn,  1972];  Processing  elements  (PEs),  simple 
or  complex  instruction  set;  Processing  element  coupling, 
tightly  (shared  memory)  or  loosely  (message  passing); 
Processor  homogeneity,  homogeneous  (identical  process¬ 
ing  elements)  or  heterogeneous  (two  or  more  different 
types  of  processing  elements);  Processor  synchroniza¬ 
tion,  synchronous,  asynchronous,  or  loosely  synchronous; 
and  Communication  network  topology,  cube,  mesh,  pyra¬ 
mid,  . . .  Details  on  these  and  other  organizational  pa¬ 
rameters  can  be  found  in  [Hwang  and  Briggs,  1984]. 


Once  a  parallel  processor  architecture  has  been  de¬ 
signed  and  an  algorithm  selected,  one  then  proceeds  to 
implement  the  algorithm  on  the  architecture.  This  is 
a  two  step  process.  The  first  step  is  called  the  mapping 
problem  [Bokhari,  198l]  and  involves  two  steps  of  its  own. 
The  second  step  is  development  of  the  actual  code.  We 
will  not  discuss  the  coding  step  as  it  involves  the  same 
effort  as  for  a  serial  algorithm  once  the  mapping  problem 
has  been  solved. 

The  mapping  problem  is  solved  in  two  steps,  the  first 
involves  partitioning  the  algorithm  into  independent  pro¬ 
cesses  and  the  second,  assigning  the  processes  to  indi¬ 
vidual  processing  elements.  A  formal  statement  of  the 
problem  is:  the  search  for  a  correspondence  between  the 
interaction  pattern  of  the  algorithm  processes  and  the 
communication  network  topology  of  the  architecture .  A 
good  solution,  or  mapping,  is  one  that  minimizes  the 
communication  overhead  and  thus  maximizes  the  effi¬ 
ciency  and  the  speedup. 

With  this  approach,  if  an  algorithm  is  not  well  suited 
to  the  given  architecture,  the  designer  is  forced  into  de¬ 
veloping  an  obscure  algorithm  implementation  which  re¬ 
sembles  the  architecture  more  so  than  the  original  algo¬ 
rithm  specification. 

In  our  methodology  we  approach  the  problem  from 
the  opposite  direction.  That  is,  we  begin  by  analyzing 
the  algorithm  to  determine  its  processing  requirements 
then,  using  these  findings,  we  specify  a  parallel  processor 
organization  that  is  well  suited  to  the  requirements.  We 
proceed  in  four  basic  steps: 

•  Control  Structure  Analysis 

In  this  step  we  identify  the  independent  processes 
that  constitute  the  algorithm  through  inspection  of 
the  processing  constructs.  Of  primary  interest  are 
iterative  constructs  (loops)  that  determine  the  over¬ 
all  complexity  of  the  algorithm  and  offer  potential 
for  parallelization.  This  step  results  in  the  identifi¬ 
cation  of  the  inherent  parallelism  contained  within 
the  algorithm. 

•  Data  Structure  Analysis 

In  this  step  we  determine  the  data  requirements  of 
each  process  identified  above.  The  result  of  this  step 
is  the  specification  of  which  data  structures  to  par¬ 
tition  and  how  to  partition  them  (distribute  them 
among  processes.) 

•  Communication  Analysis 

Identification  of  the  independent  processes  and  the 
data  structure  partitioning  scheme  will  determine 
the  communication  requirements  between  the  pro¬ 
cesses.  That  is,  a  data  structure  may  be  distributed 
among  processes  such  that  one  process  is  assigned 
a  data  item  required  by  another  process  to  com¬ 
plete  its  task.  In  this  step  such  requirements  are 
determined  as  well  as  the  appropriate  communi¬ 
cation  protocols  for  their  implementation,  such  as 
synchronous  message  passing  among  all  processes, 
asynchronous  message  exchanges  between  two  pro¬ 
cesses,  message  broadcasting  and  reduction.  The 
result  of  this  process  will  lead  to  the  specification  of 
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the  communication  network  topology  of  the  archi¬ 
tecture. 

•  Architecture  Specification 

Given  the  results  of  the  previous  steps,  this  step 
is  where  we  specify  the  architecture  in  terms  of  its 
organizational  parameters.  The  result  is  the  specifi¬ 
cation  of  a  parallel  processor  architecture  well  suited 
to  the  requirements  of  the  specified  algorithm  in 
terms  of  speedup,  efficiency,  system  complexity,  and 
programmer  burden. 

We  have  found  that  this  approach  produces  high  de¬ 
grees  of  speedup  and  efficiency  via  software  that  resem¬ 
bles  the  serial  implementation  of  the  algorithm  and  is 
therefore  no  more  difficult  to  develop  and  maintain.  Fur¬ 
thermore,  this  approach  lends  itself  to  the  design  of  par¬ 
allel  implementations  of  complete  computer  vision  sys¬ 
tems  (heterogeneous  algorithm  suites)  which  can  be  im¬ 
plemented  via  a  reconfigurable  or  a  heterogeneous  par¬ 
allel  processor  architecture. 

3  Image  Matching  -  An  Application 

3.1  Overview 

Matching  of  two  images  (or  a  map  and  an  image)  is  a 
fundamental  operation  in  computer  vision.  Various  so¬ 
lutions  to  the  problem  of  finding  correspondences  be¬ 
tween  images  have  been  proposed  ranging  from  correla¬ 
tion  [Rosenfeld  and  Kak,  1976}  to  graph  isomorphism 
[Ghahraman  et  al. ,  1980].  One  primary  distinction 
among  the  proposed  solutions  is  the  level  of  description 
at  which  the  matching  is  performed.  Correlation  based 
techniques  typically  operate  directly  on  sensor  data  (pix¬ 
els)  whereas  graph  based  approaches  often  utilize  seman¬ 
tic  structures  such  as  roads  and  buildings. 

The  image  matching  algorithm  used  in  our  study  uti¬ 
lizes  a  discrete  relaxation  based  approach  to  matching. 
It  determines  correspondences  between  line  segments  de¬ 
tected  in  each  image  based  on  symbolic  descriptions  of 
the  segments  as  well  as  geometrical  relationships  be¬ 
tween  segments.  The  algorithm  iterates  over  the  solution 
space  until  a  stable  state  is  converged  upon. 

This  algorithm  was  selected  for  study  due,  primarily, 
to  its  applicability  to  high-level  vision.  But,  the  basic 
approach  utilized  in  the  algorithm,  relaxation,  has  been 
used  in  other  applications  as  well  (Waltz,  1972,  Rosenfeld 
el  al.,  1976,  Faugeras  and  Price,  1981,  Rosenfeld  and 
Smith,  1981,  Terzopoulos,  1986,  Rutkowski  e<  al.,  198l]. 
Therefore,  the  results  of  this  study  can  be  generalized 
to  various  other  applications  in  low,  mid,  and  high-level 
vision. 

In  the  following  sections  we  present  details  of  the  ap¬ 
plication  of  our  methodology  to  the  relaxation  based  im¬ 
age  matching  algorithm.  We  present  a  brief  description 
of  the  algorithm,  application  of  the  four  steps  that  con¬ 
stitute  our  methodology,  and  a  discussion  of  the  results 
of  the  application  in  terms  of  our  four  measures,  algo¬ 
rithm  speedup,  processor  efficiency,  system  complexity, 
and  programmer  burden. 


Figure  1:  Window  construction. 

3.2  Algorithm  Description 

This  image  matching  algorithm  [Medioni  and  Neva- 
tia,  1984]  receives  input  images  from  two  independent 
sources  and  then  attempts  to  construct  a  list  of  corre¬ 
spondences  between  them  using  a  relaxation  based  ap¬ 
proach.  We  provide  an  overview  of  the  algorithm  with 
enough  detail  to  discuss  our  algorithm  mapping  method¬ 
ology.  For  details  and  explanations  beyond  the  scope  of 
our  discussion,  the  reader  should  see  the  referenced  work. 

The  primitives  used  by  the  image  matching  algorithm 
are  linear  segments,  represented  symbolically  by  their 
end  point  coordinates,  orientation,  and  average  contrast. 
Given  two  sets  of  linear  segments  extracted  from  two  im¬ 
ages  (or  an  image  and  a  map),  the  object  is  to  find  cor¬ 
respondences  between  the  segments  of  each  set  based  on 
their  symbolic  descriptions  (local  constraints)  and  on  the 
geometric  relationships  between  segments  of  the  same 
image  (global  constraints.)  The  assumptions  made  prior 
to  matching  are  that:  1)  the  orientations  of  the  two  im¬ 
ages  are  nearly  the  same;  and  2)  the  scaling  factor  from 
one  image  to  the  other  is  approximately  known. 

The  set  of  primitives,  A  =  {oi|l  <  t  <  n},  from  one 
image  is  called  the  SCENE  and  the  primitives,  a,,  are 
called  OBJECTS.  The  set  of  primitives,  L  =  {^|1  <  j  < 
to),  from  the  other  image  is  called  the  MODEL  and  the 
primitives,  lj,  are  called  LABELS.  The  algorithm  pro¬ 
ceeds  to  compute  the  quantity  p(i,j)  in  {0,  l},  which  is 
the  POSSIBILITY  that  object  a,  corresponds  to  label 
lj.  It  is  possible  that  an  object  has  no  corresponding 
label  due  to  occlusion  or  scene  change,  that  several  ob¬ 
jects  correspond  to  the  same  label  due  to  fragmentation, 
or  that  an  object  corresponds  to  several  labels  due  to 
merging.  The  method  for  computing  p(i,j)  relies  on  geo¬ 
metrical  constraints,  that  is,  when  a  label,  l},  is  assigned 
to  an  object,  ai,  we  expect  to  find  an  object,  ah,  with  a 
label,  /*,  in  an  area  defined  by  i,  j,  and  k.  The  area  is 
called  a  WINDOW  and  is  denoted  w(i,j,k). 

The  method  for  computing  w(i,j,  k)  is  as  follows.  The 
object,  is  represented  by  the  two  dimensional  vector 
A{Bi  and  the  label,  lj,  by  PjQj-  By  “sliding”  l}  over 
Oi  an  area  is  described  by  the  corresponding  motion  of 
label  Ik,  PkQk  (figure  1.)  This  parallelogram  shaped  area 
is  the  window  w(i,j,k). 

Two  object/label  assignments,  (t,j)  and  ( h,k ),  are 
COMPATIBLE,  ( i,j)C(h ,  k),  if  and  only  if  object  ah  lies 
within  window  k)  and  object  a,  lies  within  window 

w(h,k,j). 

Using  these  definitions,  the  algorithm  searches  for  ob¬ 
ject/label  correspondences  by  first  identifying  all  possi¬ 
ble  correspondences  based  on  the  symbolic  descriptions 
of  the  objects  and  labels.  This  set  of  correspondences 
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Figure  2:  Image  matching  algorithm  primary  flow. 


flag  =  1; 
while  (flag)  { 
flag  ss  0; 

/"  Iteration  of  poeeibititiee/compatibilitiee.  "/ 
for  (i  =  0;  i  <  ■■raber_of_objects;  +  +  i) 
for  (j  =  0;  j  <  aanber^Llabelij  ++j) 

»*  <PWM>  {/*«»*<»)(«)==  »*/ 

card  j  =  0; 

/*  Compete  the  degree  of  mnppoit  for  */ 

/*  the  object/label  aeeignment.  */ 

for  (k  ss  0;  k  <  n«inber_oCIabeU;  +  +  k  )  {  /*  for  all  labels  */ 
make.wiadow(object«(i],  labeU(j],  labcl«(k],  wiiijk); 
h  s  Oj  fomnd  —  0; 

while  ((k  <  anmbcr jof_objecta)  Lb  (!foaad))  { 
if  (p(k][kj  LL  ia.wiado  w(objects[h],  win Jjk) 

fomad  =  compatib^objecte^i],  Ubcl»[j],  •  bj«cU(k),  Ub«U[k]); 

+  +  k« 

}  /*  while  ((k  <  namber^of^ob  ject«)  ...  •/ 
if  (fovad) 

++cardj; 

>  /•  for  (k  =  ...  */ 

»f  (card.*  <  q)  { 

flag  =  1; 

pQH‘1  * 

>  /•  if  (carder  ...  */ 

>  /•  w  (plilW  ••  */ 

}  /*  while  (flag)  ...  •/ 


Figure  3:  Serial  code  for  image  matching  algorithm. 


constitutes  the  possibilities  at  iteration  step  0,  p°(t,  j). 
Subsequent  values  of  p(t,  j)  are  computed  by  the  itera¬ 
tion  formula: 

v(*,»>pt+1(*,j)  =  1  if  p‘(«,j)  =  1  AND 

3  subset  5  of  [l,m]  (labels)  with  q  elements  such  that 
Vs  in  5,  3k  in  [l,n]  (objects)  such  that  p‘(Jfe,s)  =  1  and 
(i,j)C(k,$). 

The  algorithm  halts  when  V(«,  j),p<+1(t,  j)  =  p*(»,  j). 

The  value  q  is  the  fit  parameter.  If  a  perfect  match 
is  desired  then  its  value  should  be  set  to  m,  the  number 
of  labels.  Otherwise  it  should  be  set  to  a  value  deter¬ 
mined  by  the  desired  degree  of  match  between  the  two 
images.  A  flow  diagram  of  the  image  matching  algorithm 
is  provided  in  figure  2  and  a  code  segment  from  the  serial 
implementation  in  figure  3. 


Figure  4:  Image  matching  primary  control  loops. 


3.3  Control  Structure  Analysis 

In  analyzing  the  control  structure  of  an  algorithm  our 
objective  is  to  determine  its  overall  time  complexity  and 
to  identify  the  specific  structures  that  dictate  that  time 
complexity,  typically  loop  constructs.  We  call  these  con¬ 
structs  primary  control  structures.  Identification  of  the 
primary  control  structures  will  help  us  to  identify  inde¬ 
pendent  processes  and  thus,  identify  areas  where  par¬ 
allelism  can  be  applied  providing  significant  algorithm 
speedup. 

The  time  complexity  of  the  image  matching  algorithm 
is  determined  as  follows.  Given  a  scene  containing  n 
objects  and  a  model  containing  m  labels,  the  maximum 
number  of  possible  object/label  pairs  is  nm,  which  oc¬ 
curs  when  every  object  is  similar  to  every  label.  At 
each  iteration  at  most  one  object/label  pair  is  discarded, 
that  is,  its  possibility  is  set  to  0,  therefore,  the  process 
converges  in  at  most  nm  iterations.  During  each  iter¬ 
ation  the  algorithm  computes  the  possibility  of  the  ob¬ 
ject/label  pair  which  is  a  measure  of  how  well  it  ‘fits’  with 
the  remaining  object/label  pairs.  In  the  worst  case,  this 
requires  investigating  nm  pairs.  Therefore,  the  complex¬ 
ity  of  the  algorithm  is  0(n3m3).  If  we  assume  an  equal 
number  of  objects  and  labels,  m,  the  algorithm  time 
complexity  can  be  expressed  as  0(m4).  Figure  4  shows, 
pictorially,  the  four  nested  loops  which  implement  this 
time  complexity.  These  constitute  the  primary  control 
structures. 

Nested  within  the  four  loops  is  the  possibility  computa¬ 
tion.  As  described  above,  it  consists  of  checking  whether 
or  not  a  given  object/label  pair  has  any  compatible  ob¬ 
ject/label  pairs.  This,  in  turn,  requires  the  computation 
of  a  window  and  the  search  for  an  object  within  it.  Once 
a  candidate  object/label  pair,  (oi,/;),  has  been  queued, 
the  possibility  computation,  p(i,j),  can  proceed  as  m 3 
independent  computations.  Each  computation  is  struc¬ 
tured  so  that  it  operates  on  an  isolated  data  set,  that  is, 
successive  passes  through  the  inner  loops  (the  possibility 
computation)  are  independent  of  one  another.  Thus,  the 
possibility  computation  can  proceed  as  multiple  parallel 
processes  and  has  the  potential  to  provide  significant  al- 
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Figure  5:  Client/Server  algorithm  partitioning. 


Figure  6:  Image  matching  primary  data  structures. 


gorithm  speedup.  For  these  reasons,  it  constitutes  our 
process  partitioning  scheme. 

Having  selected  the  possibility  computation  as  the 
process  with  which  to  partition  the  algorithm,  we  have 
produced  a  client/server  model.  That  is,  one  process 
will  queue  possible  object/label  pairings  via  the  outer 
two  primary  control  loops,  constituting  the  client,  and  a 
set  of  independent  processes  will  determine  the  possibil¬ 
ity  of  that  pairing  via  execution  of  the  inner  two  control 
loops  and  their  encompassed  procedures  in  a  distributed 
fashion,  constituting  the  servers.  Figure  5  shows  the 
client/server  algorithm  partitioning. 

3.4  Data  Structure  Analysis 

Having  identified  the  possibility  computation  as  the  task 
on  which  to  partition  the  algorithm  into  processes,  we 
must  now  determine  the  data  requirements  of  each  com¬ 
putation.  In  doing  so  we  will  identify  the  primary  data 
sti-uctures  and  determine  an  appropriate  partitioning  of 
these  structures. 

For  the  image  matching  algorithm,  three  primary  data 
structures  can  be  identified.  The  first  two  are  linear  ar¬ 
rays  of  size  m  of  symbolic  records,  one  array  each  for 
storage  of  the  set  of  objects  and  the  set  of  labels.  The 
third  data  structure  is  an  mxm  matrix  of  logical  val¬ 
ues  that  store  the  results  of  the  possibility  computation, 
p‘(»,  j),  for  each  iteration,  t.  Figure  6  shows  the  primary 
data  structures,  pictorially. 

Each  possibility  computation  (process)  requires  two 
entries  from  the  object  array,  a,  and  a*,  and  two  entries 
from  the  label  array,  lj  and  1*.  All  processes  receive 


Figure  7:  Image  matching  horizontal  swath  partitions. 

the  same  pair,  the  object/label  assignment  under 

consideration,  and  each  receives  a  unique  (a/,, Ik)  pair, 
an  object/label  assignment  that  determines  the  global 
consistency  of  the  pair  under  consideration.  From  these 
inputs  the  windows,  w(i,j,k)  and  w(h.,k,j),  are  formed. 
The  relation  (t,  j)C(h,k)  is  then  computed  by  determin¬ 
ing  whether  or  not  a*  lies  within  iu(i,  j,  k)  and  a,  lies 
within  w(h,k,  j).  A  value  of  1  is  returned  if  the  relation 
holds,  otherwise  a  value  of  0  is  returned.  The  value  of 
p‘+1(»,  j)  is  determined  by  summing  the  results  from  all 
of  the  individual  processes  and  comparing  that  sum  to 
the  fit  parameter,  q. 

If  we  assume  the  availability  of  N  —  m2  processing 
elements,  the  obvious  way  of  partitioning  the  data  struc¬ 
tures  is  to  assign  each  PE,  0  <  p  <  N—  1,  an  object/label 
pair,  (ah, It)  €  AxL.  If  the  number  of  processing  ele¬ 
ments  available  is  less  than  m2,  that  is,  N  <C  m2,  then 
the  most  intuitive  way,  from  a  programmer’s  viewpoint, 
to  partition  the  data  structures  is  to  assign  each  PE, 
0  <  P  <  JV  -  1,  to  a  l/N  sized  portion  of  the  la¬ 
bel  array  and  the  entire  object  array  thus  giving  each 
a  set  Sp  =  {(<Zh,/fc)|l  <  h  <  m,p  *  ( m/N )  <  k  < 
p  *  (m/N)  +  m/N  —  l}Vp  :  0  <  p  <  N  —  1  of  objects 
and  labels.  This  creates  N  horizontal  swathes  through 
the  possibility  matrix  as  depicted  in  figure  7.  These  hor¬ 
izontal  swathes  constitute  our  data  partitioning  scheme. 

3.5  Communication  Analysis 

Having  designed  our  process  and  data  partitions,  we 
must  now  identify  the  inter-process  communication  re¬ 
quired  to  complete  the  parallel  implementation. 

As  described  previously,  a  possibility  computation  re¬ 
quires  access  to  the  object/label  pair  under  consider¬ 
ation,  (a provided  by  the  client  process,  and  the 
set  of  possible  object/label  pairs  from  which  the  server 
processes  compute  a  degree  of  support.  The  set  of  pos¬ 
sible  pairs  are  statically  distributed  among  the  server 
processes  once,  upon  algorithm  initiation,  as  described 
above.  Conversely,  the  pair  (<**,!,)  must  be  provided 
to  each  server,  dynamically,  by  the  client  process.  This 
is  achieved  via  a  broadcast  operation  from  the  client  to 
every  server. 

Having  received  (ai,lj),  each  server  process  computes 
a  degree  of  support  for  the  pair  based  on  its  set  of  pos¬ 
sible  object/label  pairs  (its  data  partition.)  Upon  com¬ 
pletion,  each  server  reports  its  degree  of  support  to  the 
client  where  the  individual  degrees  of  support  are  com¬ 
bined  into  a  single  result  and  the  possibility  computa¬ 
tion,  pl(i,j),  is  completed.  This  is  achieved  via  a  reduc¬ 
tion  operation  from  every  server  to  the  client. 
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Finally,  the  client  must  report  p‘(»,  j)  to  the  server 
process  whose  data  partition  includes  the  pair  so 

that  it  can  update  its  possibility  value.  This  is  achieved 
via  a  point-to-point  send/receive  operation  from  the 
client  to  the  particular  server. 

In  summary,  our  process/data  partitioning  scheme  re¬ 
quires  three  types  of  communication:  1)  broadcast;  2) 
reduction;  and  3)  point-to-point  send/receive. 

This  concludes  the  analysis  steps  of  our  methodology 
as  applied  to  the  image  matching  algorithm.  We  have 
described  the  algorithm,  identified  its  primary  control 
structures,  identified  its  primary  data  structures,  parti¬ 
tioned  it  into  independent  processes,  and  identified  all 
required  inter-process  communication.  Our  remaining 
task  is  to  specify  a  parallel  architecture  well  suited  to 
the  requirements  identified  by  our  analysis.  This  is  pre¬ 
sented  in  the  following  section.  We  then  present  an  eval¬ 
uation  of  the  system  design  arrived  at  via  our  method¬ 
ology  through  architecture  simulation  and  actual  imple¬ 
mentation. 

3.6  Architecture  Specification 

In  specifying  a  parallel  processor  architecture  we  must 
address  various  organizational  parameters:  Program¬ 
ming  model;  Processing  element  type;  Processing  element 
coupling;  Processor  homogeneity;  Processor  synchroniza¬ 
tion;  and  Communication  network  topology.  Whereas 
in  the  classical  approach  this  is  done  prior  to  the  algo¬ 
rithm  analysis,  that  is,  the  parallel  implementation  of 
the  algorithm  is  specified  for  a  particular  parallel  archi¬ 
tecture,  we  base  our  specification  of  these  parameters  on 
the  results  of  our  algorithm  analysis.  In  the  following 
paragraphs  we  address  each  of  these  organizational  par 
rameters  and  discuss  how  they  are  influenced  by  the  pro¬ 
cessing  requirements  of  the  image  matching  algorithm. 

Programming  Model.  The  image  matching  algorithm 
(more  specifically,  the  possibility  computation)  con¬ 
tains  various  processing  steps  that  are  data  dependent, 
that  is,  all  data  items  are  not  processed  identically. 
The  Afultiple  /nstruction  Multiple  Zlata  programming 
model  is  best  suited  to  this  situation.  In  this  model 
each  processing  element  can  execute  code  dictated  by  its 
particular  data  items.  Conversely,  the  algorithm  could 
be  implemented  under  the  Single  /nstruction  Multiple 
Data,  programming  model,  as  demonstrated  in  [Reisis 
and  Prasanna-Kumar,  1987],  but  processing  elements 
would  spend  a  great  deal  of  time  “idling”  through  code 
which  is  not  applicable  to  their  data  items  and  thus, 
reduce  the  processor  efficiency. 

Processing  Element  Type.  Computation  of  the  com¬ 
patibility  relationship,  ( i,j)C(h,k ),  between  to  pairs 
of  object/label  correspondences,  (a,,lj)  and  (a hi  Ik), 
requires  computation  of  two  windows,  w(i,j,k)  and 
u>(h,k,j),  as  well  as  whether  or  not  the  objects  a<  and 
ah  lie  within  the  respective  windows.  These  compu¬ 
tations  require  the  use  of  transcendental  functions  as 
well  as  floating  point  arithmetic  (unless  integerization  is 
performed.)  Therefore,  the  processor  utilized  must  sup¬ 
port  these  computations.  Furthermore,  to  reduce  system 
complexity  and  programmer  burden,  the  processor  must 
be  programmable  in  a  high-level  language  that  allows 


specification  of  the  primary  data  structures  in  a  natu¬ 
ral  way,  that  is,  via  multi-field  records.  Processors  best 
suited  to  these  constraints  are  of  the  complex  instruction 
set  variety  such  as  a  general  purpose  microprocessor. 

Processing  Element  Coupling.  As  the  communication 
between  processes  is  in  bursts,  that  is,  at  the  beginning 
of  each  possibility  computation  (the  broadcast)  and  at 
the  end  of  each  possibility  computation  (the  reduction), 
a  tightly  coupled  or  shared  memory  system  would  not 
suffice  because  of  memory  access  conflicts.  Without  spe¬ 
cial  protocols  to  allow  concurrent  reading  and  writing  of 
memory,  a  communication  bottle  neck  would  exist.  Bet¬ 
ter  suited  to  the  algorithm  is  a  loosely  coupled  or  mes¬ 
sage  passing  architecture.  These  systems  facilitate  high 
bandwidth  communication  without  the  requirement  of 
special  purpose  hardware. 

Processor  Homogeneity.  Our  partitioning  scheme  pro¬ 
vides  each  server  process  with  identical  tasks.  The  client 
process  is  computationally  similar  to  the  server  processes 
in  that  it  utilizes  the  same  data  structures  as  well  as 
similar  logic.  Therefore,  the  parallel  architecture  should 
be  homogeneous,  that  is,  comprised  of  a  set  of  identi¬ 
cal  processing  elements.  This  facilitates  programming 
(reduction  of  programmer  burden)  as  well  as  hardware 
interfacing  of  processing  elements  (reduction  of  system 
complexity.) 

Processor  Synchronization.  In  light  of  the  fact  that 
there  is  computational  similarity  between  all  of  the  iden¬ 
tified  processes  as  well  as  data  dependent  processing, 
the  parallel  architecture  should  operate  in  loosely  syn¬ 
chronous  mode.  That  is,  all  processes  incorporate  iden¬ 
tical  copies  of  the  program,  with  the  exception  of  the 
client  process,  and  execute  under  control  of  their  own 
program  counter.  Synchronization  occurs  only  at  points 
of  communication.  As  we  shall  see,  this  also  facilitates 
programmability  of  the  implementation  which  reduces 
system  complexity  and  programmer  burden. 

Communication  Network  Topology.  Perhaps  the  most 
interesting  aspect  of  a  parallel  processor  architecture 
is  its  communication  network  topology,  the  processing 
element  interconnect  pattern.  As  we  showed  via  the 
communication  analysis,  the  image  matching  algorithm 
places  three  constraints  on  the  communication  network 
topology.  The  first  is  that  it  must  facilitate  an  efficient 
broadcast  operation,  the  second  is  that  is  must  facilitate 
an  efficient  reduction  operation,  and  the  third  is  that 
it  must  facilitate  an  efficient  point-to-point  send/receive 
operation.  In  the  following  paragraphs  we  consider  each 
of  these  constraints. 

With  regard  to  the  broadcast  operation,  the  ideal  mes¬ 
sage  passing  architecture  is  one  containing  a  single  com¬ 
mon  bus  to  which  all  processing  elements  are  connected. 
In  this  topology  a  broadcast  operation  is  completed  in 
0(1)  time. 

With  regard  to  the  reduction  operation,  the  ideal  algo¬ 
rithm  requires  ft(log  n)  time,  that  is,  “order  no  less  than 
logn  time”,  assuming  concurrent  read  and  write  opera¬ 
tions  are  forbidden  (Cole  and  Vishkin,  1986).  This  ideal 
time  is  achieved  by  an  algorithm  that  utilizes  a  divide 
and  conquer  approach.  The  result  is  obtained  by  divid¬ 
ing  the  data  set  into  two  halves,  finding  the  two  partial 
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results,  and  combining  the  partial  results  to  get  the  final 
result.  The  dividing  is  done  recursively  until  the  data 
sets  are  indivisible.  Such  a  divide  and  conquer  scheme 
yields  a  binary  tree  with  n3  nodes  with  the  data  items 
starting  at  the  leaves.  For  the  image  matching  algorithm 
the  data  items,  the  objects  and  labels  used  to  determine 
the  global  validity  of  a  queued  object/label  correspon¬ 
dence,  can  be  distributed  among  all  nodes  of  the  tree, 
not  just  the  leaves. 

With  regard  to  the  point-to-point  send/receive  opera¬ 
tion,  the  ideal  message  passing  architecture  is,  again,  one 
containing  a  single  common  bus  to  which  all  processing 
elements  are  connected.  In  this  topology  a  send/receive 
operation  is  completed  in  0(1)  time. 

The  reduction  operation  produces  the  most  stringent 
constraint  dictated  by  the  image  matching  algorithm. 
A  communication  network  topology  that  facilitates  this 
operation  will  also  facilitate  the  other  two  as  they  are 
of  lower  order  complexity.  Therefore,  for  parallel  im¬ 
plementation  of  the  image  matching  algorithm,  the  pro¬ 
cessing  elements  should  be  connected  via  a  binary  tree 
topology. 

To  summarize,  the  organizational  parameters  of  a  par¬ 
allel  processor  architecture  that  is  well  suited  to  the  im¬ 
age  matching  algorithm  should  be  specified  as  follows: 

•  Programming  model  -  MIMD 

•  Processing  elements  -  Complex  Instruction  Set 
Computers 

•  Processor  coupling  -  Loosely  Coupled 

•  Processor  homogeneity  -  Homogeneous 

•  Processor  synchronization  -  Loosely  Synchronous 

•  Communication  network  topology  -  Binary  Tree 

This  completes  the  application  of  our  methodology  to 
the  image  matching  algorithm.  In  the  next  section  we 
present  an  evaluation  of  the  system  design  in  terms  of 
our  measures;  algorithm  speedup,  processor  efficiency, 
system  complexity,  and  programmer  burden. 

3.7  System  Evaluation 

Having  completed  our  parallel  implementation  of  the  im¬ 
age  matching  algorithm,  we  now  present  an  evaluation 
of  the  implementation  in  terms  of  our  four  measures. 
The  evaluation  is  performed  on  the  basis  of  three  “data 
points.”  First,  use  a  serial  implementation  of  the  al¬ 
gorithm  as  a  baseline  with  which  comparisons  can  be 
performed.  Second,  we  use  a  simulation  developed  to 
analyze  the  implementation  relative  to  any  number  of 
processing  elements.  Third,  we  use  an  actual  system  im¬ 
plementation  utilizing  INMOS  Transputers  (INM,  1989] 
to  bring  validity  and  feasibility  to  the  entire  study. 

As  we  stated  earlier,  most  research  in  the  field  of  par¬ 
allel  processing  of  computer  vision  algorithms  is  primar¬ 
ily  concerned  with  algorithm  speedup  and  processor  ef¬ 
ficiency.  For  this  reason  we  begin  our  evaluation  and 
discussion  with  these  two  measures. 

Algorithm  speedup  is  defined  as  the  ratio  of  elapsed 
time  when  executing  a  program  on  a  single  processor 
to  the  elapsed  time  when  N  processors  are  available. 


That  is,  for  N  processing  elements,  algorithm  speedup 
is  defined  as 


where  2\  and  2jv  are  the  elapsed  times  for  1  and  N 
processing  elements,  respectively. 

Processor  efficiency  is  defined  as  the  average  utiliza¬ 
tion  of  the  available  processing  elements  and  can  be  spec¬ 
ified  in  terms  of  algorithm  speedup,  S n-  For  N  process¬ 
ing  elements,  processor  efficiency  is  defined  as 


If  the  efficiency,  En,  of  a  parallel  implementation  re¬ 
mains  constant  (ideally  1)  as  the  number  of  processing 
elements,  N,  is  increased,  the  parallel  implementation  of 
the  algorithm  is  said  to  have  achieved  linear  speedup. 

3.7.1  Complexity  Analysis 

Previously  we  determined  the  complexity  of  the  im¬ 
age  matching  algorithm  to  be  0(m4),  assuming  an  equal 
number  of  objects  and  labels,  m.  This  is  due  to  the 
nested  loop  structure  of  the  algorithm  where  every  ob¬ 
ject/label  pair,  is  checked  against  every  other 

object/label  pair,  (ah,h)  for  compatibility. 

In  our  partitioning  strategy,  we  distribute  the  m2  com¬ 
patibility  computations  for  each  object/label  pair  Possi¬ 
bility  computation  evenly  among  the  N  processing  el¬ 
ements.  Therefore,  barring  the  existence  of  any  data 
dependencies  or  overhead,  we  expect  to  achieve  O(N) 
speedup  and  complete  processor  utilization,  that  is,  an 
efficiency  of  1.  Unfortunately,  both  data  dependencies 
and  overhead  exist. 

The  data  dependencies  contained  within  the  image 
matching  algorithm  can  be  expressed  in  terms  of  the  pos¬ 
sible  correspondences  between  objects  and  labels.  Let  us 
define  the  value  Pj  to  be  the  set  of  possible  object  cor¬ 
respondences  for  each  label  1;,  1  <  j  <  m.  We  can  then 
define 

P  —  max^.,  \P]\* 


The  value  k  is  an  indication  of  how  evenly  the  ob¬ 
ject/label  correspondences  are  distributed.  For  instance, 
if  every  label  forms  possible  correspondences  with  the 
same  number  of  objects,  k  will  be  1.  Conversely,  if  one 
label  forms  possible  correspondences  with  a  large  num¬ 
ber  of  objects  and  the  remaining  labels  form  possible 
correspondences  with  a  small  number  of  objects,  then  k 
will  be  large. 

Using  these  definitions,  the  expected  values  for  algo¬ 
rithm  speedup  and  processor  efficiency  for  our  imple¬ 
mentation  of  the  image  matching  algorithm  (barring  any 
overhead)  are 

S'N  =  £  and 

tpe  _  I 

As  an  appeal  to  one’s  intuition,  consider  the  following 
cases.  Recall  that  our  data  partitioning  scheme  calls 
for  the  assignment  of  a  set  of  object/label  pairs  to  each 
processing  element.  If  all  sets  contain  an  equal  number 
of  possible  correspondences,  then 
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p  =  p=$'k  =  1  => 

Sff  =  N  and  EJf  =  1. 

This  implies  that  each  processing  element  is  assigned  the 
same  amount  of  work.  If  one  set  contains  more  possible 
correspondences  than  all  of  the  rest,  3j  :  \Pj\  |P,  |Vi  ^ 
j,  1  <  i  <  m,  then 

P  >  P  =>  k  »  1  => 

Sjy  <  IV  and  EeN  <  1. 

This  implies  that  the  processing  element  assigned  the  set 
Pj  must  do  more  work  than  any  of  the  other  processing 
elements. 

These  expected  values  are  to  be  considered  estimates 
of  the  overhead  incurred  by  the  implementation  due  to 
data  dependencies.  One  must  remember  that  the  actual 
distribution  of  possible  object/label  correspondences  is 
dynamic  as  it  is  the  goal  of  the  algorithm  to  reduce  this 
to  a  canonical  set  of  correspondences  via  the  relaxation 
operation.  Furthermore,  the  algorithm  contains  some 
inherently  serial  operations,  those  of  the  client  process, 
that  must  be  taken  into  consideration  in  light  of  the 
overall  performance  analysis.  Actual  values  of  algorithm 
speedup  and  processor  efficiency  will  vary  due  to  this  dy¬ 
namic  behavior  and  serialism.  The  goal  is  to  minimize 
the  effects  of  these  on  the  performance  of  the  parallel 
implementation. 

3.7.2  Measured  Performance 

To  measure  the  actual  values  of  algorithm  speedup  and 
processor  efficiency  we  devised  three  test  cases.  The  first 
is  comprised  of  two  identical  images  containing  multiple 
vertical  lines.  In  this  scenario  k  =  1.  The  second  is  com¬ 
prised  of  an  image  containing  one  vertical  line  and  mul¬ 
tiple  horizontal  lines  and  an  image  containing  one  hor¬ 
izontal  line  and  multiple  vertical  lines.  In  this  scenario 
k  =  m/2  where  m  is  the  number  of  labels.  The  third 
is  comprised  of  two  identical  images  containing  lines  ex¬ 
tracted  from  an  airfield  image.  In  this  scenario  k  =  1.67. 

Table  1  shows  the  execution  times  for  the  three  scenar¬ 
ios  when  instantiated  with  various  problem  sizes.  The 
first  four  rows  are  for  the  first  scenario  with  the  number 
of  labels,  m,  being  12,  24,  36,  and  48.  The  next  three 
rows  are  for  the  second  scenario  with  the  number  of  la¬ 
bels  being  50,  100,  and  200.  The  last  three  rows  are  for 
the  third  scenario  with  the  number  of  labels  being  51, 
102,  and  153.  Simulation  runs  were  done  with  the  num¬ 
ber  of  processing  elements  being  1,  2,  3,  4,  15,  and  m, 
the  number  of  labels.  These  are  represented  by  the  six 
columns. 

Table  2  shows  the  measured  speedup  for  each  of  the 
test  cases  and  table  3  shows  the  efficiency.  One  should 
note  that  these  measured  values  do  not  include  over¬ 
head  for  inter-processor  communication.  They  strictly 
reflect  the  algorithm  speedup  and  processor  efficiency  as 
affected  by  our  process  and  data  partitioning  schemes 
and  the  data  dependencies. 

To  observe  the  effects  of  inter-processor  communica¬ 
tion  on  the  overall  performance  (as  well  as  to  demon¬ 
strate  a  complete  application  of  our  methodology)  we 
also  developed  an  actual  parallel  processor  system  based 
on  our  implementation  of  the  image  matching  algorithm. 
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Table  1:  Execution  times  from  simulation. 
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Table  2:  Speedup  from  simulation. 


The  system  is  comprised  of  one  to  four  INMOS  Trans¬ 
puters  but  is  expandable  to  incorporate  any  number  of 
processing  elements  without  any  system  redesign. 

Tables  4,  5,  and  6  show  the  measured  execution  times, 
algorithm  speedup,  and  processor  efficiency,  respectively, 
for  the  various  test  cases  and  problem  sizes. 

Although  our  data  points  are  sparse,  the  tables  do  in¬ 
dicate  the  following  trends  in  terms  of  algorithm  speedup 
and  processor  efficiency  for  our  parallel  implementation 
of  the  image  matching  algorithm: 

•  The  implementation  is  most  effective  when  the 
problem  size  is  large,  that  is,  when  the  number  of 
possible  object/label  correspondences  is  large. 

•  The  implementation  is  most  effective  when  the  num¬ 
ber  of  processing  elements  is  less  than  or  equal  to 
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Table  3:  Efficiency  from  simulation. 
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for  (j  =  0;  j  <  ■amber  jofJabeU;  +  -f  j) 

‘((pUJU))  {  /’  i»P(«)(iJ)  ==  i  ■/ 
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(BROADCAST). 

/*  Compete  the  degree  of  tapporl  for  the  object/labcl  */ 

/*  oeeigameat  provided  bf  this  PE*  1/ath  of  the  label  table.  */ 
for  (k  s  0;  1  <  nfjharc;  4  4k)  {  for  mj  shore  of  UbeU  “/ 
moke.madow(objects(i](  lab  el*  [j],  labeU(k),  win  jjk); 
h  =  0;  foaad  s  0; 

while  ((h  <  aamber _of jobjeett)  kk  (!foaad))  { 
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+  +  h; 

}  /"  while  ((k  <  a *mber_of_objecl» )  ...  "/ 
if  (foaad) 

+  +  cord.*; 

}  /•  for  (k  =  ...  -/ 

RECEIVE  CONTRIBUTIONS  PROM  CHILDREN  (REDUCTION). 

cord.4  +  =  left  jchild_coatribattoo; 
card_e  4-=  rig ht_chLld.con  tribution; 

if  (e.ilj  <  q)  { 
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pLil(>]  = 
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SEND  CHANGES  TO  THE  PE  WITH  PAIR  (ij). 

}  /'  (ptilPl  -  •/ 

>  /-  wkiU  (<!.,)  ...  '/ 


Table  5:  Speedup  from  Transputer  implementation. 


Figure  8:  Client  code  for  parallel  image  matching  algo¬ 
rithm. 


the  number  of  labels  (when  data  dependencies  ate 
taken  into  account.) 

•  In  light  of  the  previous  items,  inter-processor  com¬ 
munication  does  not  dominate  the  implementation. 

We  now  focus  our  attention  on  our  two  new  measures, 
system  complexity  and  programmer  burden. 

3.7.3  System  Development  and  Maintenance 

As  stated  earlier,  system  complexity  is  a  measure  of 
how  closely  the  parallel  implementation  of  the  algorithm 
resembles  the  serial  implementation.  It  can  also  be 
viewed  as  the  amount  of  efTort  (cost)  required  to  real¬ 
ize  the  parallel  implementation  of  the  algorithm. 

Previously,  we  showed  a  program  segment  for  the  im¬ 
age  matching  algorithm’s  primary  control  structures  as 
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Table  6:  Efficiency  from  Transputer  implementation. 


implemented  on  a  serial  machine.  The  programming  lan¬ 
guage  was  ‘C’[Kernigan  and  Ritchie,  1978],  In  figures  8 
and  9  we  show  program  segments  for  the  client  and  server 
processes,  respectively,  also  written  in  ‘C\  Note  that  the 
algorithm-specific  constructs  are  identical  in  the  serial 
and  parallel  programs.  The  only  differences  are  the  in¬ 
clusion  of  the  subroutine  calls  to  perform  inter-processor 
communication.  Therefore,  one  can  conclude  that  the 
complexity  of  the  parallel  software  is  no  greater  than 
that  of  the  serial  implementation.  This  is  attributable 
to  the  fact  that  the  parallel  implementation  was  designed 
based  on  the  structure  of  the  algorithm,  and  not  on  the 
structure  of  the  parallel  processor  architecture. 

Programmer  burden  is  a  measure  of  the  degree  of  dif¬ 
ficulty  in  developing  and  maintaining  the  parallel  algo¬ 
rithm  implementation.  It  can  also  be  viewed  as  the 
amount  of  effort  (cost)  required  to  modify  and  debug 
the  parallel  software  in  light  of  algorithm  modifications. 
In  computer  vision,  this  measure  is  critical  due  to  the 
fact  that  the  vision  problem  is  far  from  being  solved  and 
algorithm  refinements  arrive  at  a  high  rate. 

As  discussed  above,  the  software  for  the  parallel  imple¬ 
mentation  of  the  image  matching  algorithm  is  identical 
to  that  of  the  serial  implementation  as  far  as  algorithm 
specific  constructs  are  concerned.  Therefore,  algorithm 
debugging  and  modification  can  take  place  in  the  serial 
environment  where  advanced  tools  are  readily  available 
and  then  ported  directly  to  the  parallel  environment.  Us¬ 
ing  this  technique,  once  we  achieved  a  “bug  free”  version 
of  the  algorithm  on  a  serial  computer,  we  were  able  to 
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Figure  9:  Server  code  for  parallel  image  matching  algo¬ 
rithm. 


get  it  running  in  parallel  in  approximately  twelve  hours. 
The  primary  effort  was  in  validating  the  inter-process 
communication.  But,  once  validated,  the  code  that  im¬ 
plements  the  communication  is  functionally  portable  to 
other  algorithms  that  utilize  the  same  communication 
network  topology  and  need  not  be  validated  again. 

Therefore,  we  can  conclude  that  we  have  minimized 
the  measure  of  programmer  burden  in  that  the  devel¬ 
opment  and  maintenance  efforts  for  the  parallel  imple¬ 
mentation  were  performed,  predominately,  in  the  serial 
environment. 

This  image  matching  algorithm  was  previously 
mapped  onto  a  parallel  processor  architecture  via  the 
classical  approach  of  specifying  an  architecture  then 
mapping  the  algorithm  onto  it  [Reisis  and  Prasanna- 
Kumar,  1987].  In  that  study  the  specified  architecture 
was  a  2D  mesh  connected  SIMD  architecture  consist¬ 
ing  of  relatively  simple  processing  elements,  a  parallel 
processor  architecture  well  suited  to  low-level  computer 
vision  tasks.  By  using  elaborate  data  partitioning  and 
memory  access  schemes,  an  implementation  that  theo¬ 
retically  achieves  linear  speedup  (in  the  absence  of  data 
dependencies)  was  developed.  When  data  dependencies 
are  considered,  the  implementation  achieves  the  same 
measures  of  algorithm  speedup  and  processor  efficiency 
as  we  presented.  The  study  was  purely  theoretical  and 
implementation  was  not  actually  carried  out.  To  actually 
implement  the  system  would  be  extremely  difficult  due 
to  the  nature  of  the  data  partitioning  scheme,  which  is 
retinotopic  based  to  match  the  communication  network 
topology  of  the  architecture.  Furthermore,  one  can  show 
that  the  effects  of  the  data  dependencies  on  speedup  and 
efficiency  are  exaggerated  due  to  synchronous  nature  of 


the  SIMD  machine.  Finally,  any  modification  of  the  im¬ 
plementation  (algorithm)  would  require  intimate  knowl¬ 
edge  of  the  algorithm,  the  architecture,  and  the  imple¬ 
mentation  as  is  the  case  with  most  “classical”  parallel 
algorithm  implementations. 

We  have  shown  that  these  situations  can  be  overcome 
by  design  (or  selection)  of  a  parallel  processor  architec¬ 
ture  based  on  the  processing  and  data  requirements  of 
the  algorithm  rather  than  specifying  the  algorithm  im¬ 
plementation  to  meet  the  specifications  of  the  parallel 
processor  architecture. 

4  Summary 

We  have  described  a  methodology  for  mapping  algo¬ 
rithms  onto  parallel  processor  architectures.  Utilizing 
our  methodology  to  analyze,  simulate,  and  implement  a 
commonly  used  algorithmic  technique,  relaxation,  we  ex¬ 
posed  various  characteristics  common  among  high-level 
vision  algorithms  that  must  be  considered  when  design¬ 
ing  a  parallel  implementation  if  the  goals  of  maximized 
algorithm  speedup,  maximized  processor  efficiency,  min¬ 
imized  system  complexity,  and  minimized  programmer 
burden  are  to  be  achieved.  These  characteristics  include: 
1)  the  use  of  complex  program  logic;  2)  the  existence  of 
subtle  data  dependencies;  3)  the  use  of  heterogeneous 
data  structures;  and  4)  the  dynamic  nature  of  the  data. 

As  shown  in  [Reisis  and  Prasanna-Kumar,  1987],  when 
designing  an  implementation  targeted  for  a  specific  par¬ 
allel  processor  architecture  these  issues  either  cannot  be 
sufficiently  addressed  or  require  extremely  convoluted, 
unintuitive  solutions  which  lead  to  an  implementation 
which  is  difficult  to  develop  and  maintain.  In  applying 
our  methodology  we  have  shown  that  all  of  these  issues 
can  be  addressed  without  sacrificing  any  of  the  goals. 

We  are  currently  applying  the  methodology  to  other 
stand-alone  computer  vision  algorithms  as  well  as  to 
complete  computer  vision  systems  to  determine  its  util¬ 
ity  in  specifying  a  reconfigurable  or  heterogeneous  paral¬ 
lel  processor  architecture  for  implementation  of  such  an 
algorithm  suite.  We  are  also  investigating  the  usefulness 
(cost  versus  payoff)  of  dynamic  data  partitioning  (lead 
balancing)  schemes  in  there  application  to  high-level  vi¬ 
sion  algorithm  implementations. 
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Abstract 


The  detection  of  edges  is  only  one  of  the  first  steps  in  the  understanding  of  images.  Fur¬ 
ther  processing  necessarily  involves  grouping  operations  between  contours.  We  present  a 
representation  of  edge  contours  by  approximating  B- splines  and  show  that  such  a  repre¬ 
sentation  facilitates  the  extraction  of  symmetries  between  contours.  Our  representation  is 
rich,  compact,  stable,  and  does  not  critically  depend  on  feature  extraction  whereas  inter¬ 
polating  splines  do.  We  turn  our  attention  to  the  detection  of  two  types  of  symmetries, 
skewed  and  parallel,  which  have  proven  to  be  of  great  importance  to  infer  shape  from  con¬ 
tour,  and  show  that  our  representation  is  computationally  attractive.  As  an  application, 
we  show  how  parallel  symmetries  can  be  used  to  infer  the  3-D  orientation  of  a  torus  from 
its  intensity  image. 

1  Introduction 

Edge  detection  is  not  a  goal  in  itself,  but  has  to  be  considered  as  one  of  the  steps  in  the 
processes  involved  to  understand  images.  The  question  therefore  arises  of  representing  the 
contours  formed  by  edgels  (edge  elements). 

Iconic  representations,  such  as  edge  maps,  or  chain  codes  [11]  do  not  make  the  necessary 
information  explicit:  by  definition  edgels  only  capture  very  local  properties  of  an  image,  and 
the  inference  of  higher  structures,  such  as  object  boundaries,  requires  grouping  operations. 
We  believe  that  such  operations  rely  on  basic  and  simple  properties  and  various  forms  of 
symmetry  [18].  The  representation  must  therefore  make  explicit  differential  properties  of 
contours,  such  as  tangent  and  curvature.  Furthermore,  because  of  the  variability  inherent 
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in  the  imaging  process,  the  representation  should  be  tolerant  to  noise,  partial  occlusion, 
and  perspective,  naturally  suggesting  segmented,  local  descriptors  [30]. 

If  the  world  was  composed  of  polyhedral  objects  alone,  we  would  know  to  expect  only 
straight  line  segments  in  images,  and  polygonal  approximations  such  as  performed  by 
the  Linear  package  [22]  or  Hough  transforms  [2]  would  be  appropriate.  In  many  cases, 
such  approximation  is  indeed  appropriate,  as  demonstrated  by  several  applications  such  as 
stereo  [16],  aerial  image  understanding  [13]  or  object  recognition  [19,  33],  but  is  unable  to 
capture  curvature  information,  since  it  is  a  first  order  approximation.  Also,  if  a  contour  is 
smooth,  the  number  of  points  required  to  approximate  it  may  be  quite  large,  and  the  exact 
position  of  the  points  somewhat  unrelated  to  the  contour  itself.  Another  possibility  is  to 
use  a  mixture  of  curves  and  lines  [25]  but  it  leads  to  unstabilities  of  the  description  when 
we  switch  from  one  to  another. 

These  issues  have  been  tackled  by  the  graphics  community  in  the  context  of  design,  and 
we  propose  to  use  some  of  the  resulting  tools,  particularly  B-splines. 

In  the  next  section,  we  briefly  review  some  of  the  successful  applications  of  B-splines 
to  contour  representation  in  computer  vision,  but  propose  that  approximating  splines  are 
more  appropriate  than  interpolating  splines  because  they  are  more  tolerant  of  segmenta¬ 
tion  errors.  We  derive  the  equations  and  present  results  demonstrating  that  the  resulting 
representation  is  compact  and  faithful  to  the  original  data  for  smooth  or  piecewise  smooth 
contours,  open  or  closed. 

We  then  turn  our  attention  to  an  application  ideally  suited  for  our  representation, 
the  detection  of  symmetries.  Whereas  it  is  easy  to  define  symmetry  between  two  infinite 
straight  lines,  the  concept  of  symmetry  between  curves  is  harder  to  define:  Rosenfeld  [31] 
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provides  a  lucid  account  of  the  differences  between  Blum’s  [4],  Brooks’  [7],  and  Brady’s  [5] 
definitions,  and  a  more  recent  paper  by  Ponce  [28]  gives  further  comparisons.  Here,  we  are 
interested  not  in  local  symmetries  which  provide  skeletal  shape  primitives,  but  rather  in 
symmetries  which  help  to  infer  shape  from  contour:  Nevatia  and  Ulupinar  [34]  postulate 
that  they  are  skewed  and  parallel. 

We  recall  these  definitions  in  section  3,  and  show  how  each  one  can  be  extracted  us¬ 
ing  our  B-spline  representation.  The  most  obvious  advantages  are  the  low  computational 
complexity  of  the  process  and  the  stability  of  the  results.  Finally,  we  show  that  for  the 
very  specific  case  of  a  torus,  the  detection  of  parallel  symmetries  allows  to  infer  the  3-D 
orientation  of  the  object  in  a  much  simpler  fashion  than  proposed  in  [29]. 


2  Contour  Representation 

A  very  promising  idea  for  representing  image  contours  is  to  use  piecewise  polynomials.  The 
advantages  are  obvious:  this  representation  is  rich,  compact,  analytical  and  local  in  the 
sense  that  a  small  change  in  the  original  curve  does  not  affect  the  representation  entirely. 

The  approach  commonly  used  consists  of  first  extracting  a  set  of  knots  from  the  discrete 
curve  and  then  to  approximate  the  curve  between  each  pair  of  knots  by  polynomials  under 
continuity  constraints  at  the  knots.  In  [27],  the  knots  are  extracted  by  taking  the  vertices 
of  a  polygonal  approximation,  then  a  finite  search  technique  such  as  dynamic  programming 
is  used  to  select  those  knots  which  provide  the  best  approximation  by  cubics.  A  slightly 
different  idea  is  proposed  in  [21]  where  the  initial  knots  are  selected  by  taking  the  vertices 
of  the  polygonal  approximation  of  the  tangent  orientation  signal  0(s).  The  final  knots  are 
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determined  using  a  split  and  merge  algorithm.  Conics  are  used  instead  of  cubics.  In  [15],  the 
initial  knots  are  the  zero-crossings  and  extrema  of  the  curvature  computed  after  different 
amounts  of  Gaussian  smoothing  and  are  then  selected  using  again  dynamic  programming. 
Monotone  curvature  splines  are  used  to  interpolate  through  the  selected  knots.  In  [l],  the 
knots  are  the  corners  and  smooth  joins  marked  in  the  curvature  primal  sketch.  Results 
using  circular  splines  are  shown.  Finally,  an  elegant  algorithm  is  proposed  in  [17]  which 
at  the  same  time  locates  corners  and  encodes  curve  segments  between  them  using  cubic 
B-splines. 

The  main  point  that  we  formulate  against  these  methods  is  that  they  are  too  much 
based  on  the  always  critical  segmentation  step  which  brings  up  the  stability  issue.  Also 
techniques  such  as  dynamic  programming  can  yield  a  complexity  of  0(n3)  where  n  is  the 
number  of  initial  knots  [27].  Finally,  a  simple  case  such  as  a  circle,  from  which  no  curvature 
features  can  be  extracted,  is  a  typical  example  of  a  curve  which  on  the  other  hand  could 
easily  be  approximated  using  the  following  B-spline  least-squares  fitting  method  which,  as 
we  will  see,  does  not  require  any  knot  selection  and  is  relatively  insensitive  to  noise. 

2.1  B-spline  Least-Squares  Curve  Fitting 

Although  there  are  numerous  textbooks  on  the  subject  of  B-splines  (see  [3,  10]  for  example), 
it  is  useful  to  recall  some  basic  definitions  and  properties.  It  is  well  known  that  a  B-spline 
is  a  piecewise  polynomial.  Cubic  polynomials  are  often  used  since  they  are  the  lowest  order 
for  which  the  curvature  can  change  sign.  A  B-spline  is  expressed  as  a  linear  combination 
of  basis  functions  which  are  themselves  piecewise  polynomials,  the  coefficients  being  the 
vertices  of  the  B-spline  guiding  polygon.  Thus,  a  B-spline  can  be  easily  manipulated  by 
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modifying  its  guiding  polygon,  hence  its  popularity  in  CAD  systems.  Furthermore,  as  B- 
splines  are  defined  locally,  modifying  the  position  of  a  vertex  does  not  affect  the  B-spline 
entirely.  In  the  case  of  a  planar  curve,  a  B-spline  Q(u)  —  (X(u),Y(u))  with  m  +  1  vertices 
is  defined  as  follows: 

<?(«)  =  jtvMu)  =  Y.(x,b,(u),  y,bM) 

j=0  j= 0 

In  the  above  equations,  the  (X,,  Yj)  are  the  vertices  of  the  guiding  polygon  and  Bj(u) 
the  basis  functions. 

Let  C  be  an  ordered  set  of  p  +  1  points  Pi  =  (*i,y;),  what  is  the  B-spline  which  best 
approximates  (7?  An  approach  proposed  in  [3]  consists  of  minimizing  the  distance 


R  =  two -p<r 

*= 0 

Pm  m 

=  BE  -  I()!  +  (E  -  Vi? 

»= 0  j=0  j=0 

In  the  above  formulation,  is  some  parameter  value  associated  with  the  zth  data 
point  [3].  Minimizing  R  is  equivalent  to  setting  all  partial  derivatives  dR/dX j  and  dR/dYi 
to  0,  for  0  <  /  <  m,  which  yields 


£  Xi  J2  Bj(ui)Bi(m)  =  YjXiBl(Ui) 

j= 0  »=0  t=0 

E  Yj  E  By(«.  W“.)  =  Ey.-B,(».) 

j= 0  »=0  i=0 


with  0  <  l  <m 


(1) 

(2) 
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Figure  1:  Example  of  Planar  Curve  Cubic  B-Spline  Fit 

The  linear  systems  (1)  and  (2)  are  easily  solved  for  all  Xj  and  Yj  respectively  using  stan¬ 
dard  linear  algebra,  yielding  the  guiding  polygon  of  the  B-spline  which  best  approximates 
the  original  curve.  In  the  case  of  open  curves,  we  have  the  option  to  force  end-points  to  be 
interpolated.  In  this  case,  the  first  and  last  vertices  are  simply  set  to  lie  at  the  end-points 
so  that  the  Unear  systems  are  reduced  tom-1  equations  of  m  —  1  unknowns.  In  the  case 
of  closed  curves,  the  Unear  systems  are  over  constrained  since  some  vertices  are  required 
to  be  identical.  Hence  the  pseudo-inverse  method  [2]  can  be  used.  For  more  details  about 
B-splines  and  end  conditions,  see  [3]. 

As  an  example,  figure  1(a)  shows  a  free-form  planar  curve,  figure  1(b)  the  guiding 
polygon  of  the  approximating  cubic  B-spline,  and  figure  1(c)  the  reconstructed  curve  from 
the  guiding  polygon.  Note  that  not  only  the  reconstructed  curve  is  almost  identical  to  the 
original  curve,  but  the  internal  representation  of  the  approximating  curve  consists  of  only 
a  small  number  of  vertices  (20  in  this  example). 
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(b)  Gaussian  Noise 
Added 


(c)  B-Spline  and  Guid¬ 
ing  Polygon  overlayed  on 
Noisy  Data 


Figure  2:  A  Cubic  Spline  Fit  on  Noisy  Data 

Another  example  displayed  in  figure  2  illustrates  the  fact  that  this  method  is  relatively 
insensitive  to  noise,  hence  stable.  Figure  2(a)  shows  a  circle  and  figure  2(b)  the  resulting 
curve  after  having  added  Gaussian  noise  to  the  data  points.  Figure  2(c)  shows  the  result 
of  fitting  a  cubic  B-spline  to  the  noisy  data  using  4  vertices. 

The  choice  of  m  (the  number  of  vertices)  determines  how  close  to  the  original  data  the 
approximation  is,  which  is  measured  by  R  (see  above).  The  automatic  selection  of  the 
number  of  vertices  is  not  trivial.  Our  approach  is  to  premilinary  set  a  fitting  tolerance  r0 
and  find  the  value  of  m  which  yields  the  normalized  distance  r  =  R/{p  - f  1)  closer  to  r0 
using  a  binary  search  approach. 


2.2  From  Edgels  to  B-Splines 

The  input  for  our  system  is  an  edge  map  produced  by  an  edge  detector  such  as  Canny’s  [9]. 
Three  stages  are  sequentially  considered:  linking,  corner  detection,  and  B-spline  approxi¬ 
mation. 
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2.2.1  Linking 


Although  there  exists  numerous  linking  methods  proposed  in  the  litterature  [2],  we  use  a 
simple  and  fast  scheme  which  can  be  summarized  as  follows.  No  gap-bridging  or  other 
task  is  performed,  the  goal  being  a  fast  extraction  of  the  elementary  curves  present  in  the 
image.  It  is  also  our  belief  that  point-wise  surgery  is  too  myopic,  and  that  if  grouping  is 
needed,  it  has  to  be  performed  at  a  higher  level  [18].  The  condition  for  the  algorithm  to 
work  correctly  is  that  the  input  edges  have  to  be  8-connected.  If  it  is  not  the  case,  then  a 
simple  [8]  or  a  more  elaborate  [24]  thinning  algorithm  has  to  be  applied  before  going  further. 
The  first  stage  consists  of  labelling  edge  points  according  to  the  number  of  neighboors 
they  have  in  order  to  distinguish  among  three  types  of  edgels:  end-points,  junctions,  and 
mid-points.  Then  an  edge  follower  is  applied  starting  from  end-points  and  stopped  when 
other  end-points  or  junctions  are  encountered.  The  resulting  open  curves  are  stored  and 
the  corresponding  points  deleted  from  the  edge  map  except  for  the  junctions.  The  same 
procedure  is  applied  but  starting  from  junctions  and  stopped  when  other  junctions  are  met. 
The  only  remaining  curves  in  the  edge  map  should  be  closed  and  are  finally  extracted  by 
applying  the  edge  follower  starting  from  any  mid-point. 

2.2.2  Corner  Detection 

The  detection  of  corners  is  essential  for  the  description  of  planar  curves.  Corners  correspond 
to  tangent  discontinuities  to  which  human  beings  are  very  sensitive. 

We  have  recently  developped  an  algorithm  called  adaptive  smoothing  [32]  based  on  the 
anisotropic  diffusion  principle  [26],  which  consists  of  smoothing  a  signal  while  preserving 
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and  even  enhancing  its  significant  discontinuities.  This  is  achieved  by  repeatedly  con¬ 
volving  the  signal  with  a  very  small  averaging  filter  modulated  by  a  measure  of  the  signal 
discontinuity  at  each  point.  The  method  is  extremely  attractive  since  a  single  parameter 
k  fixes  the  amplitude  of  the  discontinuities  to  preserved.  Those  features  are  then  easy  to 
detect  and  directly  localized.  Hence  no  coarse  to  fine  correspondence  problem  has  to  be 
solved  as  it  is  the  case  in  the  curvature  primal  sketch  [1]. 

When  the  signal  consists  of  the  tangent  orientation  0(s)  =  arctan  of  a  planar  curve 
C(s )  =  (x(s),y(s)),  the  adaptive  smoothing  process  tends  to  preserve  and  enhance  tangent 
orientation  discontinuities  which  correspond  to  corners.  Corner  detection  then  consists  in 
differentiating  the  smoothed  version  of  0(s)  and  extract  the  local  extrema  of  the  resulting 
signal  which  lie  above  a  threshold  which  we  set  equal  to  the  smoothing  parameter  k. 

2.2.3  B-spline  Approximation 

Once  the  image  edge  points  have  been  linked  into  curves  and  the  corners  detected,  the 
final  step  toward  image  contours  representation  consists  of  approximating  by  a  B-spline 
each  elementary  curve.  When  a  closed  curve  with  no  corners  is  considered,  a  global  least- 
squares  approximation  is  performed.  In  the  case  of  an  open  curve  or  a  closed  curve  with 
corners,  each  curve  segment  between  pairs  of  corners  is  approximated  with  the  constraint 
that  the  end-points  have  to  be  interpolated.  This  insures  the  reconstructed  curves  to  be 
continuous  at  the  corner  locations. 

Figure  3  summarizes  the  complete  process  on  an  example.  The  167  x  222  intensity 
image  of  a  Mozart  bust  is  displayed  in  figure  3(a)  and  the  contour  points  obtained  after 
applying  a  Canny  type  edge  detector  in  figure  3(b).  Those  edge  points  have  been  linked  into 
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elementary  curves  and  figure  3(c)  shows  the  longest  curve  found.  The  corners  detected  after 
adaptive  smoothing  are  displayed  in  figure  3(d),  the  end-points  being  marked  as  corners  as 
well.  Finally,  a  quadratic  B-spline  approximation  of  each  curve  segment  between  corners 
has  been  done  using  a  fitting  tolerance  of  0.5  which  led  to  the  guiding  polygon  displayed 
in  figure  3(e)  and  the  corresponding  quadratic  B-spline  displayed  in  figure  3(f). 

Finally,  in  order  to  illustrate  that  our  method  is  very  tolerant  to  segmentation  errors,  we 
show  some  results  obtained  after  different  corner  detections.  Figure  4(a)  shows  the  result  of 
a  first  corner  detection  on  the  contour  of  a  telephone  handset,  and  figure  4(b)  and  figure  4(c) 
show  the  guiding  polygon  and  the  reconstructed  curve  obtained  using  a  quadratic  B-spline 
approximation.  If  a  corner  is  missed,  as  shown  in  figure  4(d),  then  the  guiding  polygon 
shown  in  figure  4(e)  is  obtained.  The  reconstructed  curve  displayed  in  figure  4(f)  is  very 
similar  to  the  one  obtained  with  the  additional  corner. 

In  the  following  section,  we  show  how  piecewise  polynomial  representations  of  image 
contours  can  be  used  for  detecting  symmetries  in  the  image  plane. 

3  Symmetry  Detection 

The  detection  of  symmetries  is  an  essential  step  when  inferring  shapes  from  contours  [35]. 
It  has  been  shown  [14]  that  skewed  symmetries  can  be  used  for  recovering  the  3-D  structure 
of  polyhedra  from  their  2-D  line  drawings.  In  the  case  of  the  surfaces  of  revolution  [20],  the 
orthographic  projection  of  their  limbs  exhibits  reflectional  symmetry,  the  axis  of  revolution 
being  the  back-projection  of  the  symmetry  axis.  More  recently  [34],  two  kinds  of  symmetries 
have  been  proposed  to  give  significant  information  about  the  surface  shape  for  a  variety  of 
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(a)  Intensity  Image 


(b)  Jump  Edges 


(c)  After  Linking  and 
Short  Curves  Removed 


(d)  Detected  Corners 


(e)  Guiding  Polygons 


(f)  Reconstruction 


Figure  3:  Example  of  the  Overall  Process  on  a  Mozart  Bust  Image 


(a)  Detected  (b)  Guiding  (c)  Reconstruc- 

Corners  Polygon  tion 


(d)  Detected  (e)  Guiding  (f)  Reconstruc- 

Corners  Polygon  tion 


Figure  4:  If  a  Corner  is  Missed... 
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3-D  objects:  skewed  and  parallel  symmetries.  The  method  is  applicable  to  Zero  Gaussian 
Curvature  surfaces,  and  to  a  variety  of  doubly  curved  surfaces.  Results  on  the  recovery  of 
surface  orientation  for  cylindrical  and  conic  objects  are  shown  but  the  method  still  need 
to  be  validated  on  real  data.  We  propose  to  use  the  contour  representation  previously 
exposed  for  detecting  skewed  and  parallel  symmetries.  In  the  remainder  of  this  paper,  the 
lines  joining  symmetric  points  will  be  called  lines  of  symmetry  and  the  mid-points  of  these 
lines  will  form  the  axis  of  symmetry. 

3.1  Skewed  Symmetry 

3.1.1  Previous  Work 

The  detection  of  skewed  symmetries  has  been  investigated  by  several  researchers  for  the 
past  few  years.  [12,  29,  35].  In  [12],  the  method  is  based  on  the  moments  of  a  figure, 
hence  limiting  the  detection  to  non-ocduded  objects.  In  [29],  a  local  approach,  like  for  the 
detection  of  smooth  local  symmetries  [5],  is  used.  It  consists  of  using  a  local  property  of  the 
skewed  symmetry  in  order  to  identify  symmetric  edge  points.  It  is  thus  necessary  to  test 
every  possible  pair  of  edge  points  against  the  property  which  leads  to  an  0(n 2)  algorithm 
where  n  is  the  number  of  points.  However,  it  is  possible  to  reduce  the  complexity  by  using 
the  method  of  projections  [23].  Finally  a  method  using  the  Hough  transform  is  proposed 
in  [35]  but  as  opposed  to  the  two  previous  papers,  no  results  are  shown. 
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Axis  of  Symmetry 


P. 


Figure  5:  3-D  Mirror  Symmetry 


3.1.2  Theoretical  Study 

Let  us  suppose  that  we  are  given  two  parametric  3-D  curves  C'i(s)  and  ^(s)  which  are  3-D 
mirror  symmetric  with  respect  to  a  plane  V  (see  figure  5).  Without  loss  of  generality,  we 
can  choose  a  coordinate  system  such  that  we  have 


Ox  (-) 


x(s) 

y{s)  and  (^(s) 

z(5)  j 
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cos(0)  sin(^) 

cos(0)  cos(<f>) 

sin(0) 

w  = 

sin(d)  sin(</>) 

u  = 

sin(0)  cos(^) 

-* 

v  = 

—  cos(0) 

k  -  cos(^)  J 

(  sin  (<f>)  j 

^  0  > 

we  obtain  the  parametric  planar  curves  ci(s)  =  (ui(s),t/i(s))  and  c2(s)  =  («2(s),v2(s)) 
where 

txi(s)  =  cos(0)  cos(^)*(s)  +  sin(0)  cos(<f>)y(s)  -f  sin(<^)z(s) 

< 

t/i(s)  =  sin(0)z(s)  —  cos(0)y(s) 

and 

{u2(s)  =  —  cos(0)  cos(<fi)z(s)  +  sin(0)  cos  (<f>)y(s)  -f  sin  (<f>)z(s) 
v2(s)  =  —  sin(0)z(s)  —  cos(0)jf(<s) 

The  curve  a(s)  =  (u4(s)J«a(s))  such  that 

ua(s)  =  =  sin(0)cos(<£)y(s)  +  sin($)z(s) 

va(s)  =  =  -Cos(%(s) 

is  simply  the  projection  of  the  curve  (0 ,y(s),z(s))  which  is  the  axis  of  the  3-D  mirror 
symmetry  with  respect  to  the  plane  V .  It  is  easy  to  verify  that  =  ~(|j  =  ao> 

hence  the  lines  of  symmetry  are  parallel  to  each  other  in  the  image  plane  and  have  the 
same  direction  a0  =  arctan(a0).  If  C i(s)  and  C2(s)  are  co-planar,  i.e.  y(s)  and  z(s)  are 
linear,  we  have  the  following  well-known  result  that  a(s)  has  to  be  straight,  hence  c^s) 
and  c2(s)  are  skewed  symmetric  with  respect  to  a(s). 

Now  given  two  curves  ci(s)  and  c2(s)  in  the  image  plane,  is  it  possible  to  eventually 
identify  a  unique  3-D  mirror  symmetry.  Let  us  take  the  example  of  figure  6.  Figure  6(a) 
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shows  the  contour  of  a  symmetric  planar  object  viewed  from  an  arbitrary  direction  and 
figure  6(b)  a  window  taken  out  of  the  previous  image.  If  we  choose  a  direction  ao  and 
draw  the  line  segments  joining  each  point  of  a  curve  with  the  corresponding  point  in  the 
other  curve  along  that  direction,  these  segments  define  the  lines  of  symmetry  of  some  3-D 
mirror  symmetry.  Figure  6(d)  shows  what  happens  when  ao  =  90°  and  figure  6(c)  the 
corresponding  axis  of  symmetry.  Figure  6(e)  and  6(f)  show  another  case  when  a0  =  160°. 
This  example  illustrates  that  in  fact  any  value  for  a0  is  a  possible  answer. 

If  however  we  are  looking  for  3-D  mirror  symmetries  between  planar  curves,  the  answer 
is  clear  since  we  saw  that  the  axis  of  symmetry  has  to  be  straight.  In  this  case,  the  problem 
consists  of  finding  a  value  for  0o  which  yields  a  straight  axis.  Figure  6(g)  and  6(h)  show  the 
results  obtained  when  the  “correct”  value  for  #o  bas  been  found.  The  following  describes 
how  the  quadratic  B-spline  representation  of  image  contours  helps  us  in  finding  skewed 
symmetries. 

3.1.3  Quadratic  B-Spline  Implementation 

Let  cx(u)  =  (xv(u),j/x(u))  and  c2(v)  =  (*2(v), 3/2(v))  be  two  conic  segments  (see  figure  7), 
both  defined  over  the  interval  [0, 1],  given  by  the  equations 

aXyU  -f-  bx^u  -j-  Cjj 

fly,  U  -}“  6y,  U  “f*  Cy, 

and 

a*,v2  +  bXiv  +  Cj., 
a^v2  +  bV3v  -f  Cy, 


xx(u)  = 
yi(“)  = 


91 


(a)  Image  Contour 


(b)  Portion  of  (a) 


(c)  Symmetry  Axis  (90°)  (d)  Lines  of  Symmetry 


(f)  Lines  of  Symmetry 


(g)  Straight  Axis  (h)  Lines  of  Symmetry 


Figure  6:  A  Case  Study  on  the  Contour  of  a  Planar  Object 


Figure  7:  Two  Conic  Segments 

Given  a  direction  9 ,  let  Ax  +  By  -f  C  =  0  be  the  implicit  equation  of  a  straight  line 
(X>)  along  that  direction.  Suppose  that  (D)  intersects  Ci(u)  at  uq  6  [0,  l],  what  is  the 
corresponding  value(s)  of  v  for  which  (jD)  intersects  C2(v)?  After  some  straightforwara 
manipulations,  it  comes  out  that  v  is  the  root  of  the  quadratic  equation 

Av2  +  Bv  +  C  =  0 

where 


A  =  AaXJ  +  Bay2 
B  =  AbX]  +  Bby, 

C  =  Acxj  +  Bey 2  —  (AaXl  +  Ba,Vl) u0  —  (A6X,  +  Bbyi)uo  (Acx,  4*  Bey ,) 

Hence,  given  two  conic  segments  and  a  direction  9 ,  it  is  relatively  simple  to  establish  the 
mapping  from  one  segment  to  the  other  along  that  direction.  The  mid-points  of  the  lines  of 
symmetry  found  form  the  axis  of  the  3-D  mirror  symmetry  between  the  two  conic  segments. 
Note  that  in  many  cases,  there  might  not  be  solutions  such  that  v0  €  [0, 1]  (the  details  are 
skipped  for  simplicity  purpose).  As  a  quadratic  B-spline  can  be  expressed  as  a  collection  of 


Axis  of  Symmetry 


Figure  8:  Parallel  Symmetry 

connected  conic  segments  S  =  {c,(u)},  for  i  =  0,  •  •  • ,  m,  each  defined  on  the  interval  [0, 1]. 
Given  another  quadratic  B-spline  S'  =  {cj(v)},  for  j  =  0,  •  •  •  ,n,  each  conic  segment  of  S  is 
compared  against  each  conic  segment  of  S'  to  eventually  find  elementary  axis  of  symmetry. 
To  detect  skewed  symmetries,  we  compute  the  value  for  9  which  minimizes  the  torsion  of 
the  global  axis  of  symmetry  obtained  after  grouping  the  elementary  symmetries.  This  is 
achieved  by  using  Brent’s  minimization  technique  [6].  Figure  6(g)  and  6(h)  shows  the  axis 
of  skewed  symmetry  found  for  the  contours  of  figure  6(b).  Notice  that  if  the  axis  found  is 
not  straight  enough,  it  is  rejected  and  the  conclusion  is  that  there  is  no  skewed  symmetry. 

3.2  Parallel  Symmetry 

Let  c;(s)  =  ( Xi(s),yi(s )),  for  i  =  1,2,  be  two  parametric  planar  curves,  and  6{(s)  their 
tangent  orientation.  Cj(s)  and  02(3)  are  said  to  be  parallel  symmetric  if  there  exists  a  con¬ 
tinuous  monotonic  function  f(s)  such  that  ^(3)  =  02(f(s)).  Note  that  parallel  symmetry 
is  an  improper  term  because,  mathematically  speaking,  two  curves  are  parallel  symmetric 
only  if  /  =  Id.  We  use  this  term  to  be  consistent  with  [34].  Figure  8  shows  an  occurence 
of  a  parallel  symmetry  between  two  curves. 

As  far  as  we  know,  the  detection  of  parallel  symmetries  has  not  been  investigated  so  far. 
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The  following  describes  an  attractive  method  which  makes  use  of  the  quadratic  B-spline 
representation  of  image  contours. 


3.2.1  Quadratic  B-Splines  Implementation 

Suppose  that  we  are  given  two  conics  Cx( u)  =  (xx(u),yx(u))  and  c2(v)  =  (jc2(v),y2(u)) 
where 

{xx(u)  =  axiu2  +  bXlu  +  cxi 
yx(u)  =  ayi  u2  -f  by, u  +  Cy, 
and 

{x2(v)  =  aX2v2  +  bX2v  +  c*, 
y2(v)  =  a^v2  +  by2v  +  cVJ 

After  derivation,  we  have  the  parametric  equations  of  their  tangent  tanx(u)  and  tan2(y) 
given  by 

(3) 
(4) 

Under  which  conditions  are  cx(u)  and  c2(v)  parallel  symmetric? 

Writing  tanx(u)  =  tan2(v),  the  following  equations  are  easily  obtained: 


tanx(u )  = 
tan2(v)  = 


2dyiu  +  byi 
2 aXlu  4-  bXl 
^ay2V  4~  by2 
2ax.v  +  bx . 


2(aX\byJ  ayibxi)u  -f  [bXlby2  —  byibX})  _  Au  4-  B  _  f(u\ 
4(®yi®*a  ®*i®va)^  4"  i®*a  ^*i®va)  Cv,  4"  D 
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and 


B  —  Dv 
=  Cv  -  A 


/-» 


(6) 


The  function  f(u)  is  continuous  with  a  vertical  asymptote  ua  =  —  D/ C  and  an  horizontal 
asymptote  va  =  A/C.  Because  /'(it)  =  (Cu+bfe  >  0,  f(u)  is  monotonic.  What  happens 
at  ua  and  v0?  Substituting  ua  into  equation  (3),  we  obtain  <an1(ua)  =  which  equals 
tan2(v)  when  v  — >  ±oo  and  substituting  va  into  equation  (4),  we  obtain  tan2(va)  = 
which  equals  tani(u)  when  u  — ►  dboo.  Hence  we  have  the  result  that  two  conics  are 
always  parallel  symmetric.  Now  supposing  that  Cx(u)  and  c2( v)  are  only  defined  on  the 
interval  [0, 1],  we  will  say  that  the  two  segments  Ci(u)  and  c2(t/)  are  parallel  symmetric  on 
[u0,ui]  C  [0, 1]  iff  [/(«o)j/(wi)]  C  [0, 1]  where  /(u)  is  given  by  equation  (5). 


Now  that  we  have  studied  the  parallel  symmetry  between  two  conic  segments,  the  de¬ 
tection  of  parallel  symmetries  between  quadratic  B-splines  is  straightforward.  A  quadratic 
B-spline  can  be  expressed  as  a  collection  of  connected  conic  segments  S  =  {q(u)},  for 
i  =  0,  •  •  ■ ,  m,  each  defined  on  the  interval  [0, 1],  Note  that  at  the  junction  between  two 
conic  segments,  the  tangent  orientation  is  continuous.  Given  another  quadratic  B-spline 
S'  =  (c'-(v)},  for  j  =  0 each  conic  segment  of  S  is  compared  against  each  conic 
segment  of  S'  to  eventually  detect  an  elementary  parallel  symmetry  between  them.  Given 
the  simplicity  of  equations  (5)  and  (6),  and  because  of  the  usually  small  number  of  conic 
segments  involved,  the  method  is  computationally  very  efficient. 


Figure  9  shows  an  example  of  parallel  symmetry  detection  using  a  quadratic  B-spline 
approximation  starting  from  the  two  digital  curves  displayed  in  figure  9(a).  Figure  9(b) 
shows  the  quadratic  B-spline  approximation  of  the  curves  of  figure  9(a).  The  detected  ele¬ 
mentary  axis  of  symmetry  are  displayed  in  figure  9(c)  while  in  figure  9(d),  the  corresponding 
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(a)  Digital  Curves 


(b)  B-spline  Approximation 


(d)  Lines  of  Symmetry 


Figure  9:  Detection  of  Elementary  Parallel  Symmetries 
lines  of  symmetry  are  shown. 

An  additional  step  is  needed  for  selecting  those  elementary  symmetries  which  can  be 
grouped  into  a  more  global  expression  of  the  existing  symmetries  between  two  curves.  In 
the  example  of  figure  9,  it  is  obvious  that  some  elementary  symmetries  are  purely  local 
whereas  some  others  are  part  of  a  more  globed  symmetry.  Some  grouping  is  needed  and  we 
found  that  very  simple  connectivity  criteria  between  elementary  symmetries  can  be  used. 
In  figure  10,  the  largest  connected  component  has  been  isolated  and  reflects  the  global 
parallel  symmetry  between  the  two  curves.  Strictly  speaking,  because  of  the  presence  of 
some  discontinuities  along  the  axis,  the  two  curves  are  not  parallel  symmetric.  We  used 
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(a)  Lines  of  Symmetry  (b)  Symmetry  Axis 


Figure  10:  Selected  Global  Parallel  Symmetry 

this  example  in  order  to  show  that  once  again,  a  vision  task  like  this  one  has  to  deal  with 
noise  and  imperfections. 

3.2.2  The  Torus  Example 

The  torus  is  an  interesting  example  on  which  to  demonstrate  the  application  of  parallel 
symmetry.  Assuming  that  the  object  is  far  enough  from  the  .camera,  and  ignoring  its  actual 
size,  it  is  reasonable  to  model  the  imaging  process  by  an  orthographic  projection.  The  torus 
is  a  smooth  solid  of  revolution,  and  the  contours  generated  in  its  image  correspond  only 
to  limbs  or  occluding  contours,  which  are  unfortunately  viewer  dependent.  The  points  on 
these  contours  are  those  for  which  the  viewing  direction  is  tangent  to  the  surface. 

Ponce  and  Kriegman  [29]  have  shown  that  it  is  possible,  although  complicated,  to 
express  the  implicit  equation  of  the  contours  (a  reduced  equation  still  takes  25  lines!),  and 
to  use  a  least-squares  method  to  recover  the  position  and  orientation  of  a  torus  from  its 
limbs.  Instead,  we  use  the  property  of  the  torus  that  the  axes  of  parallel  symmetry  in 
its  image  are  \,he  projection  of  its  circular  spine  (3-D  skeletal  axis).  This  property  allows 


98 


us  to  recover  the  3-D  orientation  quite  simply:  we  fit  an  ellipse  to  the  detected  parallel 
symmetry  axis,  the  orientation  of  the  plane  on  which  the  torus  is  lying  is  given  by  the 
eccentricity  of  the  ellipse  and  the  angle  of  the  major  axis  with  the  horizontal.  Figure  11 
shows  the  results  obtained  for  a  torus  imaged  at  two  different  orientations.  The  first  column 
shows  the  intensity  images,  the  second  column  the  contours  along  with  the  detected  parallel 
symmetries  using  the  quadratic  B-spline  representation  of  the  contours,  and  the  last  column 
shows  the  ellipse  fitted  to  the  axis  of  symmetry  overlayed  on  the  intensity  image.  The  two 
vectors  drawn  are  the  projection  of  two  unit  vectors  in  space:  the  vertical  one  lying  on 
the  axis  of  revolution,  the  horizontal  one  lying  in  the  plane  of  the  torus  spine.  The  entire 
process,  given  the  contours,  takes  a  few  seconds  only  on  a  Symbolics  machine. 

4  Conclusion 

We  have  presented  an  approach  to  representing  contours  using  approximating  B-splines.  It 
has  attractive  properties  for  use  in  Computer  Vision:  the  representation  is  rich,  compact, 
stable,  local  and  segmented.  We  have  shown  how  this  representation  can  be  used  to  extract 
two  important  types  of  symmetry,  skewed  and  parallel,  on  contours  in  real  images.  We  are 
currently  working  on  the  selection  of  the  symmetry  axis,  their  grouping  and  interpretation 
to  generate  higher  primitives  in  images.  We  also  intend  to  apply  these  tools  to  the  detection 
of  local  symmetries. 
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Abstract 

We  present  B-snakes:  a  new  implementation  of 
snakes  using  parametric  B-splines.  This  active 
contour  model  exhibits  advantages  of  B-splines: 
compact  representation,  local  control  and  the 
possibility  to  include  corners.  This  implemen¬ 
tation  is  significantly  faster  without  loss  of  gen¬ 
erality.  Experiments  on  delineation  of  building 
roofs  in  stereo  aerial  images  are  also  presented. 


1  INTRODUCTION 

Real-world  images  are  often  noisy  and  too  complex  to 
expect  local,  low  level  operations  to  perform  a  complete 
analysis.  Higher  level  features  have  to  be  derived  and 
used  in  order  to  get  a  better  delineation  of  objects. 

When  there  exist  enough  constraints,  it  is  possible  to 
use  deformable  models,  which  adapt  to  the  data,  an  ex¬ 
ample  being  “snakes”  [Kass  ei  al.,  1988], 

We  present  an  implementation  of  such  models  based 
on  parametric  B-spline  approximation,  which  offers 
many  advantages.  Among  them,  it  provides  a  compact 
local  representation  of  a  curve,  in  terms  of  its  control- 
points.  Furthermore,  B-splines  have  the  ability  to  rep¬ 
resent  corners,  that  is,  to  locally  override  smoothness 
constraints.  A  new  active  contour  model  is  built  using 
this  B-spline  approximation  for  a  curve  and  is  called  a 
“B-snake”.  These  B-snakes  converge  much  faster  than 
snakes  and  can  include  corners  without  invoking  specific 
models. 

As  an  application,  B-snakes  are  used  to  precisely  out¬ 
line  the  boundaries  of  building  roofs  in  stereo  pairs  of 
urban  scenes,  given  an  initial  rough  outline  from  a  stan¬ 
dard  stereo  matching  algorithm  [Cochran  and  Medioni, 
1989], 
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The  paper  is  organized  as  follows:  we  briefly  review 
snakes  and  their  applications,  then  give  details  of  our  B- 
snake  implementation,  and  illustrate  the  methodology 
on  the  accurate  delineation  of  building  tops  in  stereo 
pairs  of  urban  scenes. 


2  SNAKES 


A  snake  is  a  deformable  continuous  curve,  whose  shape  is 
controlled  by  internal  forces  (the  implicit  model)  and  ex¬ 
ternal  forces  (the  data).  Internal  forces  act  as  a  smooth¬ 
ness  constraint,  and  external  forces  guide  the  active  con¬ 
tour  towards  image  features. 

Let  v(s)  =  (x(s),y(s))  be  the  parametric  description 
of  the  snake  (s  €  [0,  l]).  Its  total  energy  can  be  written 
as: 

E.nak e  =  /  E.(v(s))ds 

Jo 

=  /  [Eint(v(s))  +  (1) 

Jo 

with: 

Eint(s)  =  i(a(s)  |  v,(s)  |2  +/3(s)  |  „,,(*)  |2)  (2) 


The  goal  is  to  find  the  snake  that  minimizes  equation  (1), 
given  some  external  energy  adapted  to  image  features 
to  extract  (EedJe  =  —  |  V/(x,y)  |2,  for  example)  and 
internal  energy  whose  expression  is  given  by  (2).  The 
first  order  term  makes  the  snake  act  like  a  membrane 
and  the  second  order  one  like  a  thin  plate.  This  energy 
is  the  regularizing  term  of  the  minimization. 

The  minimization  of  (1)  is  solved  by  using  the  calculus 
of  variations  and  resolving  Euler  equations,  and  yields 
the  following  equations  in  the  discrete  case  [Kass  et  al., 
1988]: 


(3) 


\  Ax  +  Fx{x,y)  =  0 
\  Ay  +  Fy(x,y)  =  0 

where  F  =  Eext  depends  on  the  image  features  to  extract 
and  A  is  a  pentadiagonal  matrix  depending  on  a  and  f3. 

This  system  of  equations  in  (x,y)  is  solved  by  intro¬ 
ducing  an  energy  dissipation  functional  to  dissipate  the 
kinetic  energy  during  the  motion.  Let  7  be  the  Euler 
step  size.  The  expression  of  the  snake  as  a  function  of 
time  is  then: 


*1+1  =  (A  +  yl)~l(yx,  -  Fx(x,,yt)) 
y»+i  =  (A  +  yl)~l(yyt  -  Fy(xt,y,)) 
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( A  +  7 /)  1  can  be  calculated  by  LU  decompositions  in 
O(n)  time  (with  n  being  the  length  of  the  snake). 


Figure  1:  Example  of  snake  convergence,  the  external 
energy  is  the  negated  gradient. 

Figure  2  shows  an  example  of  convergence. 

This  active  contour  model  fits  in  an  interactive  human- 
machine  environment  when  the  user  supplies  an  initial 
estimate  of  the  object  to  extract  and  the  snake  is  used 
to  refine  the  results  [Kass  et  al.,  1988;  Fua  and  Hanson, 
1989b;  Fua  and  Hanson,  1989a;  Fua  and  Leclerc,  1988]. 
However,  it  is  also  useful  in  an  automatic  processes  when 
a  first  estimate  is  given  by  a  first  processing  level  [Ferrie 
el  al.,  1989;  Zucker  et  al.,  1988]. 

This  tool  has  been  applied  in  motion  [Kass  et  al., 
1988],  in  stereo  matching  [Kass  et  al.,  1988;  Fua  and 
Leclerc,  1988],  and,  more  generally,  it  can  be  used  to 
match  a  deformable  model  to  an  image  by  means  of  en¬ 
ergy  minimization. 

Different  implementations  have  been  performed,  for 
example,  Fua  [Fua  and  Hanson,  1989a]  uses  a  tool  from 
information  theory:  he  minimizes  an  objective  function 
that  is  the  length  of  encoding  the  result.  This  method¬ 
ology  is  general  and  applies  to  object  recognition  using 
generic  models.  Amini  [Amini  et  al.,  1988]  uses  dynamic 
programming  to  minimize  the  energy,  and  can  handle 
hard  local  constraints.  Berger  [Berger,  1990]  allows  the 
snake  to  grow  along  features,  and  also  to  break. 

Unfortunately,  the  convergence  rate  of  a  snake,  using 
all  points,  is  rather  slow.  Hence,  some  researchers  [Fua 
and  Leclerc,  1988;  Amini  et  al.,  1988]  use  a  polygonal 
approximation  of  the  curve,  but  then  smoothness  can  no 
longer  be  guaranteed.  Another  problem  is  that  the  only 


way  to  include  corners  is  to  set  a  =  0  at  some  locations, 
so  the  “cornerness”  of  the  curve  is  not  implicit. 

A  better  way  to  simultaneously  solve  these  problems 
is  to  use  a  parametric  B-spline  approximation  of  curves 
[Bartels  et  al.,  1987],  as  the  next  section  shows.  We  call 
this  new  model  a  “B-snake”. 

3  B-SNAKES 

In  this  model,  the  curve  is  replaced  by  its  approximation 
by  a  B-spline  and  the  energy  of  the  approximation  is 
minimized. 

We  first  discuss  the  advantages  of  the  scheme,  then 
explain  how  to  compute  the  B-spline  approximation  of 
the  curve,  and  finally  show  the  minimization  procedure 
with  B-snakes. 

Let  u  be  the  parameter  describing  the  approximat¬ 
ing  curve  (we  take  u  instead  of  s  to  remain  consistent 
with  notations  in  [Bartels  et  al.,  1987]),  and  Q(u)  = 
(*(w),y(tt)). 

In  this  approximation,  the  curve  is  split  into  seg¬ 
ments,  and  the  joints  between  adjacent  curve  segments 
are  called  knots.  Each  curve  segment  is  approximated 
by  a  piecewise  polynomial  function  (order  k),  which  is 
obtained  by  a  linear  combination  of  basis  functions  Bi 
and  a  set  of  control  vertices  Vi  =  (Xi,Yi): 

t=m 

<?(u)  =  ViBi(u)  (5) 

i=0 

The  control  polygon  can  be  calculated  by  performing 
a  least-square  fit  of  the  data  by  the  B-spline  curve  (para¬ 
graph  3.2). 

In  the  following,  let  p  +  1  be  the  number  of  points  of 
the  curve  and  m+ 1  the  number  of  vertices  of  the  control 
polygon. 

As  is  shown  in  paragraph  (3.3),  substituting  v  by  Q(u) 
in  the  snake  energy  equation  (l)  yields  a  similar  sys¬ 
tem  to  (4),  whose  unknowns  are  the  control  vertices  and 
therefore  whose  size  is  only  m  +  1  instead  of  p  +  1. 

3.1  Advantages  of  approximating  B-splines 

Local  control  :  elementary  B-splines  Bi  have  local 
support,  so  that  modifying  the  position  of  a  data- 
point  causes  only  a  small  part  of  the  curve  to  change. 

Continuity  control  :  B-splines  are  defined  with  conti¬ 
nuity  properties  at  each  point:  order  k  B-splines 
are  Ck~ 7  continuous.  But  it  is  possible  to  con¬ 
trol  the  continuity  at  the  knots,  by  accepting  multi¬ 
ple  knots.  These  are  obtained  by  letting  successive 
knots  be  equal,  which  causes  intermediate  intervals 
to  be  empty.  Let  p  be  the  multiplicity  degree  of  a 
knot,  the  continuity  at  this  knot  is  then:  Ck~l~M. 
When  n  is  equal  to  fc  —  1  the  knot  is  C°  continuous 
and  the  corresponding  control  point  is  interpolated 
by  the  curve. 

This  property  is  very  interesting  for  the  B-snake 
model:  if  we  introduce  a  multiple  knot  whose  de¬ 
gree  of  multiplicity  is  equal  to  k  —  1,  the  first  and 
second  derivatives  are  no  longer  continuous  at  this 
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knot  and  the  smoothness  constraint  is  broken:  a 
corner  appears  at  this  knot. 

3.2  Control  polygon 

To  find  the  control  polygon  at  time  0,  we  perform  a  least- 
squares  fit  of  the  data  by  a  B-spline  curve  [Bartels  el  al ., 
1987;  Saint-Marc  and  Medioni,  1990]. 

We  want  to  minimize  the  distance  between  the  discrete 
data  points  of  the  original  curve  and  its  approximation 
by  a  B-spline.  This  distance  is  given  by  the  expression: 

R  =  E  I  Q(uj)-P}  l2= 

1=0  1=0 

where  Uj  is  some  parameter  value  associated  with  the 
jth  data  point,  and  Q(uj)  is  given  by: 


»=o  1=0 


Since  this  equation  is  quadratic,  the  minima  occurs  for 
those  values  of  Xi  and  lj  such  that: 


{ 


SR 
9X  i 
SR 
OY, 


0 

0 


where  l  ranges  between  0  and  m.  So,  we  obtain: 


m  F 

i= 0  1=0 

1=0  1=0 


^XjB,(Uj) 

i=o 

(7) 

i=o 


This  equation  can  be  solved  by  a  LU  decomposition. 

The  choice  of  the  number  of  vertices,  m  +  1,  deter¬ 
mines  how  close  to  the  original  data  the  approximation 
is,  which  is  measured  by  R.  An  automatic  choice  can 
then  be  performed  [Saint-Marc  and  Medioni,  1990]:  we 
set  a  fitting  tolerance  r0  and  we  find  the  value  of  m  +  1 
which  yields  the  normalized  distance  r  =  R/(p+ 1)  closer 
to  ro,  using  a  binary  search  approach- 


3.3  Minimization  resolution 

We  want  to  minimize  equation  (1)  by  substituting  the 
curve  t>  by  the  analytical  expression  of  its  B-spline  ap¬ 
proximation  (6). 

The  total  energy  of  the  curve  is  then: 

i=0  1=0  1=0 

+  ^(«i)[(E^B''(uJ))2  +  (Ey*B,”(«i))2] 

£  t=0  i=0 

+  f(»(«i))}  (8) 

We  are  looking  for  control  points  coordinates  AT,,  K, 
that  minimize  E,  that  is,  that  satisfy: 


VI  €  {<),•••  ,m 


>{ 


SE 

VS 

eV, 


=  0 
=  0 


That  yields  for  the  X  coordinate: 

E?= 0  )*,'<«,■)  Er=o 

)  T,"L0  X, B''  (u} )+ 

Bi(»i)&F(ZZo X \B,(u3),  Er=o **(«;))] 

=  0 

(9) 

and  a  similar  equation  for  Y.  When  we  change  the  sum¬ 
mation  order,  we  get: 

E™o  A'»  Ej=oa(ur)Bt(,tj)5i(uj)+ 

ELo  P(  ui  )B  'i(  ui  )Bi(  ui  )1 
+  Ej=0  Bl(Ul)Fx(v(Uj))  =  0 


for  l  ranges  from  0  to  m. 

This  equation  set  can  be  written  in  the  same  matrix 
form  as  (3),  with  m  +  1  equations  of  m  +  1  unknowns 
(X,T)  instead  ofp4  1  (x, y): 


A),X  +  Gx(x,  y)  =  0 
AtY  +  Gy  (x,  y)  =  0 


(10) 


where  Ab  is  still  a  band  matrix. 

This  system  can  be  solved  in  way  similar  to  the  origi¬ 
nal  snakes  (4),  and  we  have: 


Xt+1  =  (Afc  +  T/)_1(7A't  -  G^zj.j/,))  (  . 

Yt+i  =  (Ab+7l)-l(7Yt-G9(*t,y,))  {  1 


4  APPLICATION  :  BUILDING  TOPS 
DELINEATION  FROM  STEREO 
DATA 


The  detection  of  cultural  features,  such  as  roads  and 
buildings  in  aerial  imagery  is  an  important  application 
area  in  Computer  Vision. 

In  recent  work  Fua  [Fua  and  Hanson,  1989a;  Fua  and 
Leclerc,  1988]  has  proposed  to  detect  such  buildings  by 
refining  a  coarse  estimate  through  a  parameter  estima¬ 
tion  phase.  Mohan  [Mohan  and  Nevatia,  1988]  defines 
a  building  as  a  collation  of  rectangles  and  proposes  to 
solve  the  selection  process  by  a  Constraint  Satisfaction 
Network. 

These  methods  use  monocular  information  only,  such 
as  edges,  to  generate  and  verify  hypotheses.  When  stereo 
data  is  available,  they  use  it  mostly  in  the  verification 
stage  to  refine  the  estimates. 

Here,  we  propose  instead  to  use  stereo  first  to  guide 
in  the  detection  of  elevated  structures,  on  the  basis  that 
their  disparity  is  bound  to  be  different  from  the  dispar¬ 
ity  of  the  background,  and  to  refine  the  estimates  using 
monocular  information.  v 

Most  stereo  algorithms  (see  [Barnard  and  Fischler, 
1982;  Dhond  and  Aggarwal,  1989]  for  surveys)  produce 
reliable  results  in  images  of  rolling  terrain,  but  degrade 
ungracefully  when  depth  discontinuities  occur,  since  the 
smoothness  assumption  becomes  violated.  This  is  true 
for  area-based  and  feature-based  methods.  We  use  here 
an  algorithm  which  combines  both  approaches,  as  de¬ 
scribed  in  [Cochran  and  Medioni,  1989].  The  buildings 
roofs  appear  as  regions  of  constant  disparity,  but  their 
boundaries  are  very  approximate,  generally  ragged. 

We  can  refine  them  by  using: 
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•  monocular  information:  buildings  are  likely  to  gen¬ 
erate  intensity  edges; 

•  smoothness:  building  boundaries  are  mostly 

smooth,  with  the  exception  of  some  corners; 

•  invariance:  the  boundaries  should  correspond  in 
both  images. 

To  turn  these  observations  into  a  computational  frame¬ 
work,  we  use  the  B-snakes  described  above.  The  internal 
energy  captures  the  smoothness  constraint,  and  we  de¬ 
fine  an  appropriate  external  energy  for  the  other  two 
constraints,  as  shown  below. 

To  solve  the  problem  introduced  by  corners,  we  pro¬ 
ceed  in  two  stages:  first,  the  boundary  is  supposed 
smooth,  and  the  snake  reaches  its  convergence  state, 
then  potential  corners  are  detected  as  extrema  of  cur¬ 
vature  and  the  B-snake  model  is  applied  again. 

We  now  give  the  details  of  the  process  and  present 
some  illustrative  results. 

4.1  Stereo  energy 

Kass  [Kass  el  a/.,  1988]  applies  snakes  to  the  problem 
of  stereo  matching.  According  to  some  psychological 
evidence  [Burt  and  Julesz,  1980],  he  assumes  that,  if 
two  contours  correspond  then  the  disparity  varies  slowly 
along  the  3-D  contour.  This  constraint  can  be  expressed 
in  an  additional  energy  functionnal: 

E.terto  -  (flL(s)  -  t>*(s))2 

where  vL  and  vR  are  left  and  right  snake  contours. 

Fua  [Fua  and  Hanson,  1989a]  uses  a  stereographic  ef¬ 
fectiveness  term  which  encodes  the  projected  patch  in 
the  second  image,  while  knowing  its  photometry  in  the 
first. 

In  our  approach,  the  contours  of  non-nul  disparity  ar¬ 
eas  are  the  first  estimate  of  objects  contours  we  want  to 
improve,  that  is,  the  initialization  of  the  snakes  at  time 
0. 

Furthermore,  we  can  combine  the  left  and  right  exter¬ 
nal  energy  of  each  object,  by  projecting  the  right  one 
on  the  left  one  through  the  disparity  map  (equation  12). 
This  allows  us  to  filter  non  matching  areas  and  to  rein¬ 
force  constraints  in  matched  areas. 

E stereo  (s)  =  EL(s)+d(s)ER(s)  (12) 

Since  edges  are  likely  to  correspond  to  depth  or  sur¬ 
face  orientation  discontinuities,  we  use  edge  information 
as  monocular  external  energy.  This  energy  supplies  the 
feature-based  information  often  used  in  stereo  matching 
algorithm  but  which  yields  a  sparse  disparity  map. 

In  order  to  increase  the  efficiency  when  the  snake  is 
too  far  from  the  edges,  a  distance  map  such  as  Cham¬ 
fer  distance  [Barrow  et  af.,  1977]  is  added  to  the  edge 
information. 

4.2  Discontinuities  C° 

Polygonal  objects  can  be  processed  without  a  priori 
knowledge  on  their  shape,  by  using  a  method  in  two 
steps: 


1.  First  stage:  Regular  B-snakes  are  implemented  and 
their  energy  is  minimized  until  equilibrium. 

2.  Second  stage:  Corners  are  detected  at  points  of 
maxima  curvature  (equation  13)  and  new  B-snakes 
are  implemented  with  multiple-knots  at  the  corners. 
Then  B-snakes  converge  from  their  previous  state 
toward  a  new  equilibrium. 

~  xuuyu 

^  /  •)  -  oil  '  ' 

(*;  +  yl)2 

Images  2  and  3  show  this  process. 

When  corners  are  detected,  we  assume  in  this  appli¬ 
cation,  that  polygonal  objects  are  encountered.  Then, 
new  parameters  are  used  in  the  second  stage,  to  empha¬ 
size  the  behavior  of  the  B-snake  acting  as  a  strong  rod 
between  corners. 


Figure  2:  First  example:  Initialization,  result  of  first  step 
and  final  result.  The  external  energy  is  also  shown. 


Figure  3:  Second  example:  Initialization,  result  of  first 
step  and  final  result.  The  external  energy  is  also  shown. 
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4.S  Results 

Images  4  to  17  show  two  series  of  examples  obtained  with 
quadratic  B-snakes. 

For  each  example,  the  left  and  right  disparity  maps 
are  shown,  which  are  blurred  and  noisy  at  corners.  The 
initialization  of  B-snakes  are  extracted  from  the  left  one, 
and  the  process  is  performed  on  this  side. 

Results  are  shown  at  both  steps  of  the  process  de¬ 
scribed  above.  We  can  see  that  after  the  first  step,  roofs 
borders  are  improved,  but  remained  rounded  at  corners. 
They  become  sharper  after  the  second  step. 

While  it  gives  the  direction  of  the  nearest  edge,  the 
Chamfer  distance  helps  the  convergence  especially  when 
the  curve  is  too  far  from  the  edges.  But  it  does  not  pro¬ 
vide  reliable  information  at  locations  of  multiple  nearby 
edges.  This  and  the  lack  of  edge  information  at  some 
locations  contribute  to  cause  B-snakes  to  stabilize  into 
local  minima. 

Furthermore,  this  energy  makes  the  B-snakes  to  shrink 
or  to  expand  only  if  the  first  estimate  is  around  local 
maxima  otherwise,  it  shrinks  until  vanishing  (for  exam¬ 
ple:  the  highest  tower  cannot  be  handled  considering  the 
poor  edge  information  used). 

5  CONCLUSION 

Snakes  provide  a  tool  to  solve  many  vision  problems  by 
means  of  global  energy-minimizing,  while  taking  into  ac¬ 
count  geometrical  model  of  curves  and  image  features  in¬ 
formation.  As  the  energy  is  integrated  along  the  entire 
length  of  the  curve,  it  is  less  sensitive  to  image  noise  and 
various  photometric  anomalies. 

We  have  improved  this  tool  by  using  parametric  B- 
spline  approximations  of  curves  that  yield  increasing 
convergence  speed  and  allow  the  so-called  B-snake  to 
include  corners. 

Then,  the  B-snake  can  be  applied  to  adjustment  of 
non-smooth  shapes.  For  example,  it  is  able  to  refine  the 
delineation  of  building  tops  from  stereo  aerial  images, 
with  a  good  accuracy,  without  using  a  priori  knowledge 
or  generic  model. 
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Figure  10:  Different  steps  of  delineation  of  buildings 
roofs  from  the  first  estimate  of  B-snakes  from  edges  of 
disparity  map  to  the  final  result  with  corners 


Figure  7:  Left  and  right  Chamfer  distance 
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Figure  12:  Left  and  right  disparity 


Figure  13:  Left  and  right  negated  gradient 
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Figure  17:  Different  steps  of  delineation  of  buildings 
roofs  from  the  first  estimate  of  B-snakes  from  edges  of 
disparity  map  to  the  final  result  with  corners 


Figure  14:  Left  and  right  Chamfer  distance 


